1.Ajax
Ajax是一种在无需重庆加载整个页面的情况下,能够更新部分页面的技术。
如下:
在谷歌浏览器中按F12查看抓包,点击network-》xhr(表示是ajax)-》点击其中一个可以看见是post方式

当你一个字母一个字母慢慢输入时,你会抓到更多的Ajax请求数据包
例如输入hello,会发生5次请求





图中的表示是Ajax接口:
2.用post方式模拟浏览器
~~~~~ 2.1 sug接口
import urllib.request
import urllib.parse
post_url='https://fanyi.baidu.com/sug'
word = input('请输入您要查询的英文单词:')
# 构建post表单数据
form_data={
'kw':word,
}
# 发送请求的过程
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
}
# 构建请求对象
request=urllib.request.Request(url=post_url,headers=headers)
# 处理post表单数据
form_data=urllib.parse.urlencode(form_data).encode()
# 发送请求
response=urllib.request.urlopen(request,data=form_data)
print(response.read().decode())


{"errno":0,"data":[{"k":"hello","v":"int.\/n. (\u7528\u4e8e\u95ee\u5019\u3001\u63a5\u7535\u8bdd\u6216\u5f15\u8d77\u6ce8\u610f)\u54c8\u7f57\uff0c\u5582\uff0c\u4f60\u597d; (\u8868\u793a\u60ca\u8bb6)\u563f; (\u8ba4\u4e3a\u522b\u4eba\u8bf4\u4e86\u8822\u8bdd"},{"k":"hello everyone","v":" \u5927\u5bb6\u597d; \u54c8\u7f57\u5927\u5bb6\u597d; \u5404\u4f4d\u597d;"},{"k":"hello kitty","v":"n. \u5361\u901a\u4e16\u754c\u4e2d; \u6709\u8fd9\u6837\u4e00\u53ea\u5c0f\u732b;"},{"k":"hellow","v":" \uff08\u901a\u5e38\u7684\u62db\u547c\u8bed\uff09\u55e8\uff0c \uff08\u6253\u7535\u8bdd\u7528\uff09\u5582\uff01\uff0c \uff08\u82f1\uff09\uff08\u8868\u793a\u60ca\u8bb6\uff09\u54ce\u54df;"},{"k":"hello girl","v":"\u7f51\u7edc \u5973\u63a5\u7ebf\u5458;"}]}
看不懂可以在www.json.cn网站去解析:

用fiddler抓取的sug接口和解析出来的是一样的:
~~~~~ 2.2 v2transapi接口
在百度翻译翻译“wolf”并用fiddler抓包
点击v2transapi接口-》Raw:查看请求的url
查看webform
import urllib.request
import urllib.parse
post_url='https://fanyi.baidu.com/v2transapi?from=en&to=zh '
word='wolf'
# 构建post表单
form_data={
'from':'en',
'query':word,
'sign':'275695.55262', # sign值是根据word内容会变化,此值只用于爬取‘wolf’
'simple_means_flag':'3',
'to':'zh',
'token':'82b6ce71cfa54c45a4e70fc877e8f31f',
'transtype':'realtime'
}
# 伪装头部
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
}
# 构建请求
request=urllib.request.Request(url=post_url,headers=headers)
# 处理数据
form_data=urllib.parse.urlencode(form_data).encode()
# 发送请求
response=urllib.request.urlopen(url=request,data=form_data)
print(response.read().decode())
此时返回有错误,一定是没有模拟好浏览器的原因
根据Raw修改:
注意:可以一个一个添加尝试找出哪些需要哪些不需要哪些可有可无
import urllib.request
import urllib.parse
post_url = 'https://fanyi.baidu.com/v2transapi?from=en&to=zh'
word = 'wolf'
formdata = {
'from': 'en',
'query': word,
'sign': '275695.55262',
'simple_means_flag': '3',
'to': 'zh',
'token': '82b6ce71cfa54c45a4e70fc877e8f31f',
'transtype': 'realtime'
}
headers = {
'Host': 'fanyi.baidu.com',
'Connection': 'keep-alive',
# 'Content-Length': '120', # 长度会自动改变
'Accept': '*/*',
'Origin': 'https://fanyi.baidu.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': 'http://fanyi.baidu.com/?aldtype=16047',
# 'Accept-Encoding': 'gzip, deflate, br', # 作用是进行数据压缩,但是一般不用压缩
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': 'BAIDUID=0C87C2094BD02FD9A7401AF355DE4B7E:FG=1; BIDUPSID=0C87C2094BD02FD9A7401AF355DE4B7E; PSTM=1562114062; H_WISE_SIDS=135670_125704_136263_100808_136648_136552_135928_128066_134982_128142_136435_120180_136365_132911_136455_136620_131247_136682_122158_136721_132378_131518_118882_118870_118839_118819_118789_107315_132781_136799_136093_135906_133352_129655_136193_132250_124639_128968_135308_133847_132551_135433_134526_135874_134046_129644_131423_136017_110085_136145_134154_127969_133994_131951_135671_135457_127417_136076_135036_134936_136635_135005_131545_136319_134351_136322_136415; to_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; APPGUIDE_8_2_2=1; from_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; delPer=0; H_PS_PSSID=1455_21086_30211_30071_30240; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1575635892,1575636241,1575683454; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1575683454; __yjsv5_shitong=1.0_7_027a9009a966dc40683d96e9ecb3897ad249_300_1575683454239_202.117.137.142_2cbb13f5; yjs_js_security_passport=42e862b5469361e79d2631ad8e3fb08cbf82fc69_1575683455_js; PSINO=1'
}
request = urllib.request.Request(url=post_url, headers=headers)
formdata = urllib.parse.urlencode(formdata).encode()
response = urllib.request.urlopen(request, formdata)
print(response.read().decode())

{"trans_result":{"data":[{"dst":"\u72fc","prefixWrap":0,"result":[[0,"\u72fc",["0|4"],[],["0|4"],["0|3"]]],"src":"wolf"}],"from":"en","status":0,"to":"zh","type":
......
解析结果:
版权声明:本文为weixin_44321116原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。