Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看
優(yōu)采云 發(fā)布時(shí)間: 2021-06-20 19:43Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看
前言
本文中的文字圖片過(guò)濾網(wǎng)絡(luò )可用于學(xué)習、交流,不具有任何商業(yè)用途。如有任何問(wèn)題,請及時(shí)聯(lián)系我們處理。
Python爬蟲(chóng)、數據分析、網(wǎng)站development等案例教程視頻免費在線(xiàn)觀(guān)看
https://space.bilibili.com/523606542
基本的開(kāi)發(fā)環(huán)境。爬取兩個(gè)公眾號的文章:
1.藍光編程公眾號擁有的爬取文章
2、爬取所有關(guān)于python文章的公眾號
爬取藍光編程公眾號擁有的文章
1、登錄公眾號后點(diǎn)擊圖片和文字
2、打開(kāi)開(kāi)發(fā)者工具
3、點(diǎn)擊超鏈接
加載相關(guān)數據時(shí),有一個(gè)數據包,包括文章title、鏈接、摘要、發(fā)布時(shí)間等,您也可以選擇其他公眾號進(jìn)行抓取,但這需要您有一個(gè)微信公眾號帳戶(hù)。
添加cookie
import?pprint
import?time
import?requests
import?csv
f?=?open('青燈公眾號文章.csv',?mode='a',?encoding='utf-8',?newline='')
csv_writer?=?csv.DictWriter(f,?fieldnames=['標題',?'文章發(fā)布時(shí)間',?'文章地址'])
csv_writer.writeheader()
for?page?in?range(0,?40,?5):
????url?=?f'https://mp.weixin.qq.com/cgi-bin/appmsg?action=list_ex&begin={page}&count=5&fakeid=&type=9&query=&token=1252678642&lang=zh_CN&f=json&ajax=1'
????headers?=?{
????????'cookie':?'加cookie',
????????'referer':?'https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&createType=0&token=1252678642&lang=zh_CN',
????????'user-agent':?'Mozilla/5.0?(Windows?NT?10.0;?WOW64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/81.0.4044.138?Safari/537.36',
????}
????response?=?requests.get(url=url,?headers=headers)
????html_data?=?response.json()
????pprint.pprint(response.json())
????lis?=?html_data['app_msg_list']
????for?li?in?lis:
????????title?=?li['title']
????????link_url?=?li['link']
????????update_time?=?li['update_time']
????????timeArray?=?time.localtime(int(update_time))
????????otherStyleTime?=?time.strftime("%Y-%m-%d?%H:%M:%S",?timeArray)
????????dit?=?{
????????????'標題':?title,
????????????'文章發(fā)布時(shí)間':?otherStyleTime,
????????????'文章地址':?link_url,
????????}
????????csv_writer.writerow(dit)
????????print(dit)
抓取所有關(guān)于python文章的公眾賬號
1、搜狗搜索python選擇微信
注意:如果不登錄,只能抓取前十頁(yè)數據。登錄后可以爬取2W多文章。
2.直接爬取靜態(tài)網(wǎng)頁(yè)的標題、公眾號、文章地址、發(fā)布時(shí)間。
import?time
import?requests
import?parsel
import?csv
f?=?open('公眾號文章.csv',?mode='a',?encoding='utf-8',?newline='')
csv_writer?=?csv.DictWriter(f,?fieldnames=['標題',?'公眾號',?'文章發(fā)布時(shí)間',?'文章地址'])
csv_writer.writeheader()
for?page?in?range(1,?2447):
????url?=?f'https://weixin.sogou.com/weixin?query=python&_sug_type_=&s_from=input&_sug_=n&type=2&page={page}&ie=utf8'
????headers?=?{
????????'Cookie':?'自己的cookie',
????????'Host':?'weixin.sogou.com',
????????'Referer':?'https://www.sogou.com/web?query=python&_asf=www.sogou.com&_ast=&w=01019900&p=40040100&ie=utf8&from=index-nologin&s_from=index&sut=1396&sst0=1610779538290&lkt=0%2C0%2C0&sugsuv=1590216228113568&sugtime=1610779538290',
????????'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?WOW64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/81.0.4044.138?Safari/537.36',
????}
????response?=?requests.get(url=url,?headers=headers)
????selector?=?parsel.Selector(response.text)
????lis?=?selector.css('.news-list?li')
????for?li?in?lis:
????????title_list?=?li.css('.txt-box?h3?a::text').getall()
????????num?=?len(title_list)
????????if?num?==?1:
????????????title_str?=?'python'?+?title_list[0]
????????else:
????????????title_str?=?'python'.join(title_list)
????????href?=?li.css('.txt-box?h3?a::attr(href)').get()
????????article_url?=?'https://weixin.sogou.com'?+?href
????????name?=?li.css('.s-p?a::text').get()
????????date?=?li.css('.s-p::attr(t)').get()
????????timeArray?=?time.localtime(int(date))
????????otherStyleTime?=?time.strftime("%Y-%m-%d?%H:%M:%S",?timeArray)
????????dit?=?{
????????????'標題':?title_str,
????????????'公眾號':?name,
????????????'文章發(fā)布時(shí)間':?otherStyleTime,
????????????'文章地址':?article_url,
????????}
????????csv_writer.writerow(dit)
????????print(title_str,?name,?otherStyleTime,?article_url)
本文同步分享到博客“松鼠愛(ài)餅干”(CSDN)。