亚洲国产精品无码久久大片,亚洲AV无码乱码麻豆精品国产,亚洲品质自拍网站,少妇伦子伦精品无码STYLES,国产精久久久久久久

querylist采集微信公眾號文章( Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看 )

優(yōu)采云 發(fā)布時(shí)間: 2022-02-21 08:11

  querylist采集微信公眾號文章(

Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看

)

  

  

  

  前言

  本文文字及圖片經(jīng)網(wǎng)絡(luò )過(guò)濾,僅供學(xué)習交流,不具有任何商業(yè)用途。如有任何問(wèn)題,請及時(shí)聯(lián)系我們處理。

  Python爬蟲(chóng)、數據分析、網(wǎng)站開(kāi)發(fā)等案例教程視頻免費在線(xiàn)觀(guān)看

  https://space.bilibili.com/523606542

  

  基礎開(kāi)發(fā)環(huán)境爬取兩個(gè)公眾號的文章:

  1、爬取所有文章

  青燈編程公眾號

  2、爬取所有關(guān)于python的公眾號文章

  爬取所有文章

  青燈編程公眾號

  1、登錄公眾號后點(diǎn)擊圖文

  

  2、打開(kāi)開(kāi)發(fā)者工具

  

  3、點(diǎn)擊超鏈接

  

  加載相關(guān)數據時(shí),有關(guān)于數據包的,包括文章標題、鏈接、摘要、發(fā)布時(shí)間等。你也可以選擇其他公眾號去爬取,但這需要你有微信公眾號。

  添加 cookie

  import pprint

import time

import requests

import csv

f = open('青燈公眾號文章.csv', mode='a', encoding='utf-8', newline='')

csv_writer = csv.DictWriter(f, fieldnames=['標題', '文章發(fā)布時(shí)間', '文章地址'])

csv_writer.writeheader()

for page in range(0, 40, 5):

url = f'https://mp.weixin.qq.com/cgi-bin/appmsg?action=list_ex&begin={page}&count=5&fakeid=&type=9&query=&token=1252678642&lang=zh_CN&f=json&ajax=1'

headers = {

'cookie': '加cookie',

'referer': 'https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&createType=0&token=1252678642&lang=zh_CN',

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',

}

response = requests.get(url=url, headers=headers)

html_data = response.json()

pprint.pprint(response.json())

lis = html_data['app_msg_list']

for li in lis:

title = li['title']

link_url = li['link']

update_time = li['update_time']

timeArray = time.localtime(int(update_time))

otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)

dit = {

'標題': title,

'文章發(fā)布時(shí)間': otherStyleTime,

'文章地址': link_url,

}

csv_writer.writerow(dit)

print(dit)

  爬取所有關(guān)于python的公眾號文章

  1、搜狗搜索python選擇微信

  

  注意:如果不登錄,只能爬取前十頁(yè)的數據。登錄后可以爬取2W多篇文章文章.

  2、抓取標題、公眾號、文章地址,抓取發(fā)布時(shí)的靜態(tài)網(wǎng)頁(yè)

  import time

import requests

import parsel

import csv

f = open('公眾號文章.csv', mode='a', encoding='utf-8', newline='')

csv_writer = csv.DictWriter(f, fieldnames=['標題', '公眾號', '文章發(fā)布時(shí)間', '文章地址'])

csv_writer.writeheader()

for page in range(1, 2447):

url = f'https://weixin.sogou.com/weixin?query=python&_sug_type_=&s_from=input&_sug_=n&type=2&page={page}&ie=utf8'

headers = {

'Cookie': '自己的cookie',

'Host': 'weixin.sogou.com',

'Referer': 'https://www.sogou.com/web?query=python&_asf=www.sogou.com&_ast=&w=01019900&p=40040100&ie=utf8&from=index-nologin&s_from=index&sut=1396&sst0=1610779538290&lkt=0%2C0%2C0&sugsuv=1590216228113568&sugtime=1610779538290',

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',

}

response = requests.get(url=url, headers=headers)

selector = parsel.Selector(response.text)

lis = selector.css('.news-list li')

for li in lis:

title_list = li.css('.txt-box h3 a::text').getall()

num = len(title_list)

if num == 1:

title_str = 'python' + title_list[0]

else:

title_str = 'python'.join(title_list)

href = li.css('.txt-box h3 a::attr(href)').get()

article_url = 'https://weixin.sogou.com' + href

name = li.css('.s-p a::text').get()

date = li.css('.s-p::attr(t)').get()

timeArray = time.localtime(int(date))

otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)

dit = {

'標題': title_str,

'公眾號': name,

'文章發(fā)布時(shí)間': otherStyleTime,

'文章地址': article_url,

}

csv_writer.writerow(dit)

print(title_str, name, otherStyleTime, article_url)

  

0 個(gè)評論

要回復文章請先登錄注冊


官方客服QQ群

微信人工客服

QQ人工客服


線(xiàn)

亚洲国产精品无码久久大片,亚洲AV无码乱码麻豆精品国产,亚洲品质自拍网站,少妇伦子伦精品无码STYLES,国产精久久久久久久