唐.唐诗三百首(306首)_诗词分类_诗词名句网https://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__1.html在这个网站上爬
https://www.shicimingju.com/shicimark/tangshisanbaishou_2_0__1.html
一共有16页,所以url为
for i in range(1, 17):
url = 'http://www.shicimingju.com/shicimark/tangshisanbaishou_' + str(i) + '_0__0.html'import requests
from bs4 import BeautifulSoup
url = 'http://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__0.html'
r = requests.get(url)
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup)#这是为了获取第一页面的代码
按F12可以看到

诗的人物介绍在<div class="list_num_info">
诗的内容在<div class="shici_list_main">
用soup.find_all()获取这部分信息。
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
url = 'https://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__1.html'
r = requests.get(url)
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
html1 = soup.find_all(class_='shici_list_main')
for text in html1:
text = text.get_text().replace('\n', '').replace(' ', '')
print(text)得

另一个一样
存储就用padas库,pandas.DataFrame
版权声明:本文为qq_56256779原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。