2012-09-12 18:23:49 8 Comments
Is there a way to trigger a method in a Spider class just before it terminates?
I can terminate the spider myself, like this:
class MySpider(CrawlSpider):
#Config stuff goes here...
def quit(self):
#Do some stuff...
raise CloseSpider('MySpider is quitting now.')
def my_parser(self, response):
if termination_condition:
self.quit()
#Parsing stuff goes here...
But I can't find any information on how to determine when the spider is about to quit naturally.
Related Questions
Sponsored Content
55 Answered Questions
[SOLVED] Calling an external command in Python
- 2008-09-18 01:35:30
- freshWoWer
- 2871960 View
- 4030 Score
- 55 Answer
- Tags: python shell command subprocess external
10 Answered Questions
16 Answered Questions
[SOLVED] How to make a chain of function decorators?
- 2009-04-11 07:05:31
- Imran
- 459062 View
- 2502 Score
- 16 Answer
- Tags: python decorator python-decorators
18 Answered Questions
[SOLVED] Using global variables in a function
- 2009-01-08 05:45:02
- user46646
- 2861156 View
- 2685 Score
- 18 Answer
- Tags: python global-variables scope
1 Answered Questions
Scrapy: CrawlSpider doesn't parse the response
- 2018-07-14 09:41:28
- Saw
- 55 View
- 0 Score
- 1 Answer
- Tags: python scrapy-spider
1 Answered Questions
1 Answered Questions
[SOLVED] Scrapy. How to change spider settings after start crawling?
- 2012-05-11 00:46:42
- fcmax
- 3103 View
- 6 Score
- 1 Answer
- Tags: python web-scraping scrapy
1 Answered Questions
[SOLVED] Closing database connection from pipeline and middleware in Scrapy
- 2013-05-23 10:10:25
- Jamie Brown
- 1849 View
- 5 Score
- 1 Answer
- Tags: python web-scraping scrapy scrapy-pipeline
1 Answered Questions
[SOLVED] Scrapy DupeFilter on a per spider basis?
- 2014-08-07 15:20:56
- todinov
- 880 View
- 3 Score
- 1 Answer
- Tags: scrapy
2 Answered Questions
[SOLVED] Scrapy CrawlSpider doesn't crawl the first landing page
- 2013-04-05 14:07:17
- gpanterov
- 5812 View
- 12 Score
- 2 Answer
- Tags: python scrapy web-crawler
5 comments
@Chris 2013-09-19 22:17:43
For me the accepted did not work / is outdated at least for scrapy 0.19. I got it to work with the following though:
@slavugan 2017-04-05 16:04:36
if you have many spiders and want to do something before each of them closing, maybe it will be convenient to add statscollector in your project.
in settings:
and collector:
@Levon 2016-10-12 09:45:11
For Scrapy version 1.0.0+ (it may also work for older versions).
One good usage is to add tqdm progress bar to scrapy spider.
@An Se 2018-01-31 10:12:28
this must be selected answer, thanks Levon
@not2qubit 2018-10-02 13:12:48
This is the new method! Although it look less transparent, it's advantage is to remove the extra clutter of using:
def __init__(self):..
and the PyDispatcher import withfrom scrapy.xlib.pydispatch import dispatcher
.@THIS USER NEEDS HELP 2015-10-23 22:29:51
Just to update, you can just call
closed
function like this:@Aminah Nuraini 2015-11-02 20:48:59
In my scrapy it's
def close(self, reason):
, notclosed
@El Ruso 2016-01-29 23:14:56
@AminahNuraini Scrapy 1.0.4
def closed(reason)
@dm03514 2012-09-12 18:40:11
It looks like you can register a signal listener through
dispatcher
.I would try something like:
@Abe 2012-09-12 18:52:20
Works perfectly. But I'd suggest naming the method MySpider.quit() or something similar, to avoid confusion with the signal name. Thanks!
@Daniel Werner 2012-09-13 19:23:36
Excellent solution. And yes, the example should work exactly the same with a
CrawlSpider
.@not2qubit 2014-01-04 20:06:28
This solution also work fine on Scrapy 0.20.0, contrary to what @Chris said below.
@shellbye 2014-12-25 02:44:57
This solution also work fine on Scrapy 0.24.4, contrary to what @Chris said below.
@chishaku 2015-03-09 09:26:25
I'm confused by why the second parameter of spider_closed is necessary. Isn't the spider to be closed self?
@Desprit 2016-09-16 12:14:56
Doesn't work with v. 1.1 because xlib.pydispatch was deprecated. Instead, they recommend to use PyDispatcher. Though couldn't make it work yet...
@wj127 2017-03-15 14:51:00
Fabolous! This is exactly what I was looking for! And works perfectly fine! Great input mate! And thanks :3
@not2qubit 2018-10-02 13:06:59
This still works in
Python 3.6.4
, withScrapy 1.5.1
and usingPyDispatcher 2.0.5
, and even if you also have adef spider_closed(..)
in some pipeline Class in yourpipelines.py
. However, it is also deprecated as shown here, so use the new method as explained by @Levon.@mthecreator 2018-11-07 18:55:19
This stll works on python 2.7 wit scrapy 1.5.1