python保存html_把response.body保存为html文件?

我的蜘蛛可以工作,但我无法下载我在.html文件中爬行的网站的主体。如果我写self.html_fil.write('test'),那么它工作得很好。我不知道如何把薄纱改成细绳。

我使用Python3.6

蜘蛛:class ExampleSpider(scrapy.Spider):

name = "example"

allowed_domains = ['google.com']

start_urls = ['http://google.com/']

def __init__(self):

self.path_to_html = html_path + 'index.html'

self.path_to_header = header_path + 'index.html'

self.html_file = open(self.path_to_html, 'w')

def parse(self, response):

url = response.url

self.html_file.write(response.body)

self.html_file.close()

yield {

'url': url

}

轨迹:Traceback (most recent call last):

File "c:\python\python36-32\lib\site-packages\twisted\internet\defer.py", line

653, in _runCallbacks

current.result = callback(current.result, *args, **kw)

File "c:\Users\kv\AtomProjects\example_project\example_bot\example_bot\spiders

\example.py", line 35, in parse

self.html_file.write(response.body)

TypeError: write() argument must be str, not bytes


版权声明:本文为weixin_42518930原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。