Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看
優(yōu)采云 發(fā)布時(shí)間: 2021-01-25 10:46Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看
前言
本文中的文本和圖片過(guò)濾網(wǎng)絡(luò )可以用于學(xué)習,交流,并且沒(méi)有任何商業(yè)用途。如有任何疑問(wèn),請與我們聯(lián)系進(jìn)行處理。
免費在線(xiàn)觀(guān)看有關(guān)Python采集器,數據分析,網(wǎng)站開(kāi)發(fā)等的案例教程視頻
https://space.bilibili.com/523606542
基本開(kāi)發(fā)環(huán)境抓取了兩個(gè)官方帳戶(hù)的文章:
1。爬網(wǎng)Cyan Light編程官方帳戶(hù)的所有文章
2,抓取有關(guān)python 文章的所有官方帳戶(hù)
檢索Cyan Light編程官方帳戶(hù)的所有文章
1,登錄正式帳戶(hù)后,單擊圖片和文字
2,打開(kāi)開(kāi)發(fā)人員工具
3,單擊超鏈接
加載相關(guān)數據后,會(huì )出現一個(gè)數據包,包括文章標題,鏈接,摘要,發(fā)布時(shí)間等。您還可以選擇其他官方帳戶(hù)進(jìn)行抓取,但這需要您擁有一個(gè)微信公眾號帳戶(hù)。
添加cookie
import pprint
import time
import requests
import csv
f = open('青燈公眾號文章.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['標題', '文章發(fā)布時(shí)間', '文章地址'])
csv_writer.writeheader()
for page in range(0, 40, 5):
url = f'https://mp.weixin.qq.com/cgi-bin/appmsg?action=list_ex&begin={page}&count=5&fakeid=&type=9&query=&token=1252678642&lang=zh_CN&f=json&ajax=1'
headers = {
'cookie': '加cookie',
'referer': 'https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&createType=0&token=1252678642&lang=zh_CN',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
}
response = requests.get(url=url, headers=headers)
html_data = response.json()
pprint.pprint(response.json())
lis = html_data['app_msg_list']
for li in lis:
title = li['title']
link_url = li['link']
update_time = li['update_time']
timeArray = time.localtime(int(update_time))
otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
dit = {
'標題': title,
'文章發(fā)布時(shí)間': otherStyleTime,
'文章地址': link_url,
}
csv_writer.writerow(dit)
print(dit)
檢索有關(guān)python 文章的所有正式帳戶(hù)
1,搜狗搜索python選擇微信
注意:如果不登錄,則只能抓取數據的前十頁(yè)。登錄后,您可以檢索2W以上的文章文章。
2。直接檢索靜態(tài)網(wǎng)頁(yè)的標題,官方帳戶(hù),文章地址和發(fā)布時(shí)間。
import time
import requests
import parsel
import csv
f = open('公眾號文章.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['標題', '公眾號', '文章發(fā)布時(shí)間', '文章地址'])
csv_writer.writeheader()
for page in range(1, 2447):
url = f'https://weixin.sogou.com/weixin?query=python&_sug_type_=&s_from=input&_sug_=n&type=2&page={page}&ie=utf8'
headers = {
'Cookie': '自己的cookie',
'Host': 'weixin.sogou.com',
'Referer': 'https://www.sogou.com/web?query=python&_asf=www.sogou.com&_ast=&w=01019900&p=40040100&ie=utf8&from=index-nologin&s_from=index&sut=1396&sst0=1610779538290&lkt=0%2C0%2C0&sugsuv=1590216228113568&sugtime=1610779538290',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
}
response = requests.get(url=url, headers=headers)
selector = parsel.Selector(response.text)
lis = selector.css('.news-list li')
for li in lis:
title_list = li.css('.txt-box h3 a::text').getall()
num = len(title_list)
if num == 1:
title_str = 'python' + title_list[0]
else:
title_str = 'python'.join(title_list)
href = li.css('.txt-box h3 a::attr(href)').get()
article_url = 'https://weixin.sogou.com' + href
name = li.css('.s-p a::text').get()
date = li.css('.s-p::attr(t)').get()
timeArray = time.localtime(int(date))
otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
dit = {
'標題': title_str,
'公眾號': name,
'文章發(fā)布時(shí)間': otherStyleTime,
'文章地址': article_url,
}
csv_writer.writerow(dit)
print(title_str, name, otherStyleTime, article_url)