朋友在某网站学习,每次都要打开网页,想保存在本地播放,却没有下载方式。于是找我试试能不能下载下来。
1.分析一下网页,打开,按下F12,获取相关信息。
2.视频信息一般保存在m3u8文件中,因此直接搜索查找。
3.观察m3u8文件发现,ts文件为aes-128加密,key的获取方法就是直接从url获取。
4.从文件中提取出来url信息后,获取key值,将key转换为十六进制,尝试解密,发现成功。那么此种方法可行,只需要将m3u8中的ts链接获取,根据秘钥一个一个解密就可以了。
5.ts文件爬取完成后,在dos窗口使用命令行直接将文件合并就可以了。
copy /b *.ts videos.mp4
6.完整代码
# -*- coding: utf-8 -*-
"""
Time : 2021-04-02 13:53
Name : 茅十八
File : spider_shipin.py
Topic : 爬取中国会计网视频
"""
import requests
import re
import json
import base64
from Crypto.Cipher import AES
class Kuai_ji(object):
def __init__(self):
self.header = {
'Host': 'elearning.chinaacc.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://member.chinaacc.com/',
'Connection': 'keep-alive',
'Cookie': 'hd_uid=CjsAJWBj98UjEPREAzzTAg==; clct_nuID=16171642301299115; bdp_uuid=25a70cc998-1ceae87102-e8be58078c; zg_did=%7B%22did%22%3A%20%22178867fdea4119-0154b88d38575c-4c3f237d-144000-178867fdea523d%22%7D; zg_9b4551cf447148b0845f31f91e8a524d=%7B%22sid%22%3A%201618188966620%2C%22updated%22%3A%201618189325497%2C%22info%22%3A%201617860882274%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%2282819233%22%7D; Hm_lvt_f1ca44b62370e4b7dc11d5937e51c2d6=1617929078,1617929186,1617955478,1618188971; _pk_id.member.chinaacc.com.e1fb=1cf2a9176d4ec1af.1617348690.1.1617348690.1617348690.; lastloginuser=m7677_41968; SelCourse=a|; _pk_id.www.chinaacc.com.e1fb=0be54a9159621948.1617929192.3.1618189311.1618189311.; _pk_ref.www.chinaacc.com.e1fb=%5B%22%22%2C%22%22%2C1618189311%2C%22https%3A%2F%2Fwww.chinaacc.com%2F%22%5D; _pk_id.www.chinaacc.com.eab1=2f935fa43caa4d22.1617929210.3.1618189325.1618189317.; _pk_ref.www.chinaacc.com.eab1=%5B%22%22%2C%22%22%2C1618189317%2C%22https%3A%2F%2Fwww.chinaacc.com%2F%22%5D; zg_ffaecff2118841b9866c8c549ea3c5a9=%7B%22sid%22%3A%201617958599807%2C%22updated%22%3A%201617960522067%2C%22info%22%3A%201617929338497%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22elearning.chinaacc.com%22%7D; trackerSdkVisitor_isNew=true; trackerSdkData={%22uid%22:%2281263452%22%2C%22platform_source%22:%22web%22%2C%22time%22:1618189325622%2C%22bdp_uuid%22:%2225a70cc998-1ceae87102-e8be58078c%22}; BIGipServermember.chinaacc.com=654392074.20480.0000; clientID=qJQvIBoumlK4DmXSbdByMvqZukMzCNGRuQHxZQDNXpBN8dzrqLDzUr99QJXCu3mladZpXkW9DwNR%0D%0A0gFt_OGTGbltFnm7o8c3MtWmo7jdFDY%0D%0A; client_ucToken=9F7D029DA3ADB8A8F60EBDE0D85B0312-6bab3e37de618bbc456b1a315b7ddfb1-01; Hm_lpvt_f1ca44b62370e4b7dc11d5937e51c2d6=1618189326; sid=e27f6440-c9bd-4a64-af08-1bcb7733c338; cdeluid=81263452; username=m7677_41968; JSESSIONID=5701254F6527C1F5E24B334186D48BE2; _pk_ses.www.chinaacc.com.e1fb=*; _pk_ses.www.chinaacc.com.eab1=*',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
}
self.url = "https://www.chinaacc.com/demo/h5/2/198/cware-39252/video-901.html"
self.m3u8_url = self.getm3u8()
def getm3u8(self):
rous = requests.get(self.url, headers=self.header)
data = re.findall(r"JSON.parse\('(.*)'\)", rous.text)[0].replace('\\', '')
json_data = json.loads(data).get('videoPath')
m3u8_url = 'https:' + json_data
return m3u8_url
def get_data(self):
print(self.m3u8_url)
data = requests.get(self.m3u8_url).text
aes_url = re.findall(r'#EXT-X-KEY:METHOD=AES-128,URI="(.*)"', data)[0]
keys = requests.get(aes_url).text
key = base64.b64decode(keys)
iv = '00000000000000000000000000000000'
iv = bytes.fromhex(iv)
ts_datas = re.findall(r'(/ssec.chinaacc.com/.*)\n', data)
i = 0
for ts_data in ts_datas:
ts_url = 'http:/' + ts_data
print(ts_url)
ts_rous = requests.get(ts_url)
file_path = str(i) + ".ts"
to_file_path = 'videos\\' + str(i) + '.ts'
with open(file_path, 'wb') as f:
f.write(ts_rous.content)
with open(file_path, 'rb') as f:
cryptor = AES.new(key, AES.MODE_CBC, iv) # 创建实例
plain_data = cryptor.decrypt(f.read()) # 放入需要解密的东西
with open(to_file_path, 'wb') as w:
w.write(plain_data)
i += 1
def main():
kuai_ji = Kuai_ji()
kuai_ji.get_data()
if __name__ == '__main__':
main()
版权声明:本文为qq_36016668原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。