Unhandled error in Deferred
报错为
Exception ignored in: < generator object iter_errback at 0x0000028B0762A620>
RuntimeError: generator ignored GeneratorExit
Unhandled error in Deferred
2017-08-23 22:22:10 [scrapy.core.scraper] ERROR: Spider error processing <GET http://dy.163.com/v2/article/detail/CSHP96ET0512D03F.html> (referer: http://news.163.com/special/0001386F/rank_whole.html)
Traceback (most recent call last):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit
2017-08-23 22:22:10 [twisted] CRITICAL: Unhandled error in Deferred:
2017-08-23 22:22:10 [twisted] CRITICAL:
Traceback (most recent call last):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\twisted\internet\task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\engine.py", line 210, in crawl
self.schedule(request, spider)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\queuelib\pqueue.py", line 33, in push
self.queues[priority] = self.qfactory(priority)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 114, in _newdq
return self.dqclass(join(self.dqdir, 'p%s' % priority))
File "D:\Users\ophsy\Anaconda3\lib\site-packages\queuelib\queue.py", line 142, in __init__
self.size, = struct.unpack(self.SIZE_FORMAT, qsize)
struct.error: unpack requires a bytes object of length 4
2017-08-23 22:22:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://dy.163.com/v2/article/detail/CSFV6G2C0527ATOM.html> (referer: http://news.163.com/special/0001386F/rank_whole.html)
2017-08-23 22:22:15 [normal_comment_spider] DEBUG: in parse_new
2017-08-23 22:22:15 [scrapy.core.scraper] ERROR: Spider error processing <GET http://dy.163.com/v2/article/detail/CSFV6G2C0527ATOM.html> (referer: http://news.163.com/special/0001386F/rank_whole.html)
Traceback (most recent call last):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit
2017-08-23 22:22:15 [twisted] CRITICAL: Unhandled error in Deferred:
2017-08-23 22:22:15 [twisted] CRITICAL:
Traceback (most recent call last):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\twisted\internet\task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\engine.py", line 210, in crawl
self.schedule(request, spider)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\queuelib\pqueue.py", line 33, in push
self.queues[priority] = self.qfactory(priority)
File "D:\Users\ophsy\Anaconda3\lib\site-packages\scrapy\core\scheduler.py", line 114, in _newdq
return self.dqclass(join(self.dqdir, 'p%s' % priority))
File "D:\Users\ophsy\Anaconda3\lib\site-packages\queuelib\queue.py", line 142, in __init__
self.size, = struct.unpack(self.SIZE_FORMAT, qsize)
struct.error: unpack requires a bytes object of length 4
这是twisted模块出错,一般来说是由于Request请求队列排序出错
- 按提示是Request队列安排错误
- 有些说法是pywin32包没有安装到对应版本
出现原因
如果开启了分布式爬虫、大规模爬虫等设定,但是没有设置完整,缺少后续操作步骤就会报错,因为Request队列无法进行爬虫顺序排序。比如本例中,由于之前在setting.py中只设置了 JOBDIR = r'D:\Project\files\crawlstatus',但是没有相对应的其他设置,所以导致队列排序出错
解决方式
去掉分布式爬虫、大规模爬虫相定,删掉或完善setting.py内对应设置。本例去掉了 JOBDIR = r'D:\Project\files\crawlstatus'这一项
版权声明:本文为lei396601057原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。