爬虫—代理如何使用

我这里用的速度代理的http.api提取的。
使用代理IP
一,requests使用代理
requests的代理需要构造一个字典,然后通过设置proxies参数即可。

复制代码
import requests

proxy = ‘60.186.9.233’
proxies = {
‘http’: ‘http://’ + proxy,
‘https’: ‘https://’ + proxy
}
try:
res = requests.get(‘http://httpbin.org/get’, proxies=proxies)
print(res.text)
except requests.exceptions.ConnectionError as e:
print(‘error’, e.args)
复制代码
运行结果:

复制代码
{
“args”: {},
“headers”: {
“Accept”: “/”,
“Accept-Encoding”: “gzip, deflate”,
“Host”: “httpbin.org”,
“User-Agent”: “python-requests/2.18.4”
},
“origin”: “60.186.9.233”,
“url”: “https://httpbin.org/get”
}
复制代码
其运行结果的origin是代理的IP,说明代理设置成功。如果代理需要认证,再代理的前面加上用户名密码即可。

proxy = ‘username:password@60.186.9.233’
二,Selenium使用代理
Selenium同样可以设置代理,一种是有界面浏览器,Chrome为例;另一种是无头浏览器,以PhantomJS为例。

Chrome浏览器设置

通过chrome_options来设置代理,才创建Chrome对象的时候用chrome_options参数传递即可。运行代码会弹出Chrome浏览器,访问连接后看到如下结果。

复制代码

chrome代理设置

from selenium import webdriver

proxy = ‘60.186.9.233’
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(’–proxy-server=http://’ + proxy)
browser = webdriver.Chrome(chrome_options=chrome_options)
res = browser.get(‘http://httpbin.org/get’)
复制代码
复制代码
{
“args”: {},
“headers”: {
“Accept”: “text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8”,
“Accept-Encoding”: “gzip, deflate”,
“Accept-Language”: “zh-CN,zh;q=0.9”,
“Host”: “httpbin.org”,
“Upgrade-Insecure-Requests”: “1”,
“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36”
},
“origin”: “60.186.9.233”,
“url”: “https://httpbin.org/get”
}
复制代码

PhantomJS设置

使用service_args参数将命令行的一些参数定义为列表,在初始化的时候传递给PhantomJS就可以了。

复制代码

PhantomJs代理设置

from selenium import webdriver

service_args = [
‘–proxy=60.186.9.233’,
‘–proxy-type=http’
]
browser = webdriver.PhantomJS(service_args=service_args)
browser.get(‘http://httpbin.org/get’)
print(browser.page_source)
复制代码
运行结果:

复制代码
{
“args”: {},
“headers”: {
“Accept”: “text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8”,
“Accept-Encoding”: “gzip, deflate”,
“Accept-Language”: “zh-CN,zh;q=0.9”,
“Host”: “httpbin.org”,
“Upgrade-Insecure-Requests”: “1”,
“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36”
},
“origin”: “60.186.9.233”,
“url”: “https://httpbin.org/get”
}
复制代码
如果需要认证,那么在service_args参数中加入–proxy-auth选项即可。

service_args = [
‘–proxy=60.186.9.233’,
‘–proxy-type=http’,
‘–proxy-auth=username:password’
]


版权声明:本文为pt530743618原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。