python爬虫requests库_Python爬虫学习笔记-2.Requests库

Requests是Python的一个优雅而简单的HTTP库,它比Pyhton内置的urllib库,更加强大。

0X01 基本使用

安装 Requests,只要在你的终端中运行这个简单命令即可:

pip install requests

基本HTTP 请求类型:

r = requests.get('http://httpbin.org/get')

r= requests.post("http://httpbin.org/post")

r= requests.put("http://httpbin.org/put")

r= requests.delete("http://httpbin.org/delete")

r= requests.head("http://httpbin.org/get")

r= requests.options("http://httpbin.org/get")

简单的一个请求:

import requests

r= requests.get('http://192.168.125.129/config/sql.php?id=1')

print r.headers

print r.status_code

print r.url

print r.text

print r.content

GET方式:

import requests

payload={'id':1}

r= requests.get('http://192.168.125.129/config/sql.php',params=payload)

print r.url

print r.content

POST方式:

import requests

payload={'id':1}

r= requests.post('http://192.168.125.129/config/sql.php',data=payload)

print r.content

0X02 高级用法

1、设置headers

import requests

url='http://192.168.125.129/config/sql.php?id=1'headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0'}

r= requests.get(url,headers=headers)

print r.text

2、模拟登录和抓取数据的简单示例

s =requests.session()

data= {'user':'用户名','passdw':'密码'}

#post 换成登录的地址,

res=s.post('http://www.xxx.com/login.php',data);

#换成抓取的地址

s.get('http://www.xxx.com/admin/config.php');

3、已知cookie,进行登录

import requests

raw_cookies="PHPSESSID=0c1e5a748e064e93e91cca1714708339; security=impossible"cookies={}for line in raw_cookies.split(';'):

key,value=line.split('=',1)

cookies[key]=value

testurl='http://192.168.125.129/vulnerabilities/upload/'s=requests.get(testurl,cookies=cookies)

print s.text

4、SSL证书验证问题

result=requests.get('https://www.v2ex.com', verify=False)

忽略验证SSL证书,不然会报错

5、302重定向

result=s.post(loginUrl,data=postdata,headers=header,verify=False,allow_redirects=False)

6、使用Python Requests上传表单数据和文件

import requests

url= "http://www.xxx.cn/upload.php"files={"username":(None,"test"),'filename':('1.jpg',open('1.jpg','rb'),'image/jpeg'),"password":(None,"test123!")}

res= requests.post(url, files=files)

print res.request.body

print res.request.headers

输出请求体、请求头效果如下:

--5e800fd12507423aa2e4a024db7b1fa1

Content-Disposition: form-data; name="username"test--5e800fd12507423aa2e4a024db7b1fa1

Content-Disposition: form-data; name="password"test123!

--5e800fd12507423aa2e4a024db7b1fa1

Content-Disposition: form-data; name="filename"; filename="1.jpg"Content-Type: image/jpeg11111111111111111

1111111111111

11111111111111111

--5e800fd12507423aa2e4a024db7b1fa1--{'Content-Length': '667', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.12.4', 'Connection': 'keep-alive', 'Content-Type': 'multipart/form-data; boundary=5e800fd12507423aa2e4a024db7b1fa1'}

参考资料:

http://cn.python-requests.org/zh_CN/latest/user/quickstart.html