requests,re的简单用法,爬取百度图片,输入关键词自动爬取

requests,re的简单用法,爬取百度图片,输入关键词自动爬取

#实战:爬取百度图片
#1.拿到页面源代码,拿到子页面链接地址 href   2.通过href拿到子页面内容,找到图片下载地址    img->src      3.下载图片
import requests
import bs4
import re
str=input("输入要下载的图片的关键字\n")
uil=f'https://image.baidu.com/search/flip?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word={str}'
#uil='https://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=猫'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

response=requests.get(uil,headers=headers).text
#print(response)
image_urls=re.findall('URL":"(.*?)",',response)
#print(image_urls)
image_urls=list(set(image_urls))
for image_url in image_urls:
    if (image_url[-4:]=='.jpg' and image_url[:10]=='https://ss'):
        image_name=image_url.split('/')[-1]
        print(image_url)

#
        image=requests.get(image_url).content

#保存图片
        f = open('img3/' + image_name, 'wb')
        f.write(image)


版权声明:本文为qq_54119714原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。