python爬取唐诗

唐.唐诗三百首(306首)_诗词分类_诗词名句网https://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__1.html在这个网站上爬

https://www.shicimingju.com/shicimark/tangshisanbaishou_2_0__1.html

一共有16页,所以url为

for i in range(1, 17):
    url = 'http://www.shicimingju.com/shicimark/tangshisanbaishou_' + str(i) + '_0__0.html'
import requests
from bs4 import BeautifulSoup


url = 'http://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__0.html'
r = requests.get(url)
demo = r.text  
soup = BeautifulSoup(demo, "html.parser")
print(soup)#

这是为了获取第一页面的代码

按F12可以看到

 

 诗的人物介绍在<div class="list_num_info">

诗的内容在<div class="shici_list_main">

用soup.find_all()获取这部分信息。

# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup


url = 'https://www.shicimingju.com/shicimark/tangshisanbaishou_1_0__1.html'
r = requests.get(url)
demo = r.text  
soup = BeautifulSoup(demo, "html.parser")
html1 = soup.find_all(class_='shici_list_main')
for text in html1:
    text = text.get_text().replace('\n', '').replace(' ', '')
    print(text)

得 

 另一个一样

 存储就用padas库,pandas.DataFrame


版权声明:本文为qq_56256779原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。