python request返回的响应不全_python-requests:获取响应内容的头部而不消耗所有内容...

Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg. an ogg file or a PDF file). Based on the result, I might decide to fetch it all. However calling the text method after having tested the mime-type only returns what hasn't been consumed yet. How could I test the mime-type without consuming the response content?

Below is my current code.

import requests

import magic

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)

mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":

print(r.text) # I'd like r.text to give me the entire response content

Thanks!

解决方案

Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False. That option has since been renamed to stream and the boolean value is inverted, so you want stream=True.

The original answer follows.

Once you use iter_content(), you have to continue using it; .text indirectly uses the same interface under the hood (via .content).

In other words, by using iter_content() at all, you have to do the work .text does by hand:

from requests.compat import chardet

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)

peek = r.iter_content(256).next()

mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":

contents = peek + b''.join(r.iter_content(10 * 1024))

encoding = r.encoding

if encoding is None:

# detect encoding

encoding = chardet.detect(contents)['encoding']

try:

textcontent = str(contents, encoding, errors='replace')

except (LookupError, TypeError):

textcontent = str(contents, errors='replace')

print(textcontent)

presuming you use Python 3.

The alternative is to make 2 requests:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)

mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":

print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

Python 2 version:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)

peek = r.iter_content(256).next()

mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":

contents = peek + ''.join(r.iter_content(10 * 1024))

encoding = r.encoding

if encoding is None:

# detect encoding

encoding = chardet.detect(contents)['encoding']

try:

textcontent = unicode(contents, encoding, errors='replace')

except (LookupError, TypeError):

textcontent = unicode(contents, errors='replace')

print(textcontent)