2011-01-17 06:17:01 8 Comments
How do you utilize proxy support with the python web-scraping framework Scrapy?
Related Questions
Sponsored Content
23 Answered Questions
17 Answered Questions
[SOLVED] How to use threading in Python?
- 2010-05-17 04:24:00
- albruno
- 824882 View
- 1066 Score
- 17 Answer
- Tags: python multithreading
22 Answered Questions
8 Answered Questions
[SOLVED] Can scrapy be used to scrape dynamic content from websites that are using AJAX?
- 2011-12-18 06:03:11
- Joseph
- 98974 View
- 125 Score
- 8 Answer
- Tags: javascript python ajax screen-scraping scrapy
14 Answered Questions
18 Answered Questions
[SOLVED] "OSError: [Errno 1] Operation not permitted" when installing Scrapy in OSX 10.11 (El Capitan) (System Integrity Protection)
- 2015-08-09 01:00:37
- Luis U.
- 177029 View
- 197 Score
- 18 Answer
- Tags: python macos python-2.7 scrapy
10 Answered Questions
[SOLVED] Why is reading lines from stdin much slower in C++ than Python?
- 2012-02-21 02:17:50
- JJC
- 224341 View
- 1621 Score
- 10 Answer
- Tags: python c++ benchmarking iostream getline
9 Answered Questions
[SOLVED] How to know if an object has an attribute in Python
- 2009-03-04 14:45:59
- Lucas Gabriel Sánchez
- 659880 View
- 1278 Score
- 9 Answer
- Tags: python attributes
7 comments
@Amom 2013-12-16 10:25:22
Single Proxy
Enable
HttpProxyMiddleware
in yoursettings.py
, like this:pass proxy to request via
request.meta
:You also can choose a proxy address randomly if you have an address pool. Like this:
Multiple Proxies
@Rafael T 2014-12-22 20:16:41
The documentation says that the
HttpProxyMiddleware
is setting the proxy inside every Requests meta attr, so enabling ProxyMiddleware AND setting it manually would make no sense@Thamme Gowda 2017-07-21 03:48:12
I should have copied this code. I glanced it and then coded myself, but proxy functionality was not working. Now I see the proxy value was set to
request.headers
instead ofrequest.meta
. Stupid me (face palm)! I went to see theHttpProxyMiddleware
code, it skips if someone has already setrequest.meta['proxy']
, so there is no need to list it in the settings github.com/scrapy/scrapy/blob/master/scrapy/…@Shahryar Saljoughi 2015-04-18 10:46:02
1-Create a new file called “middlewares.py” and save it in your scrapy project and add the following code to it.
2 – Open your project’s configuration file (./project_name/settings.py) and add the following code
Now, your requests should be passed by this proxy. Simple, isn’t it ?
@ccdpowell 2015-05-07 01:09:38
I implement your solution which looks correct, but I keep getting a Twisted error: twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>] ANY ADVICE???
@Greg Sadetsky 2016-02-28 03:03:22
Take care to use
base64.b64encode
instead ofbase64.encodestring
as the latter adds a newline character to the encoded base64 result...! See stackoverflow.com/a/32243566/426790@Ekrem Gurdal 2018-07-06 07:37:48
How can we change proxy after 20 request to not to be banned?
@ephemient 2011-01-17 06:29:08
From the Scrapy FAQ,
The easiest way to use a proxy is to set the environment variable
http_proxy
. How this is done depends on your shell.if you want to use https proxy and visited https web,to set the environment variable
http_proxy
you should follow below,@no1 2011-01-17 11:59:19
Thanks ... So I need to set this var before running scrapy crawler it's not possible to set it or change it from the crawler code
@Pablo Hoffman 2011-01-25 19:35:58
You can even set the proxy on a per-request base with: request.meta['proxy'] = 'your.proxy.address'
@Lionel 2011-11-20 16:59:40
How do you authenticate the proxy?
@ocean800 2017-06-19 22:58:14
@ephemient How can we tell if
scrapy
is using the proxy?@Shannon Cole 2018-06-24 12:53:50
@ocean800 I use scrapy to scrape a website that shows your current IP to see if it's using the proxy. That way I can load the page via a chrome and see my actual IP and compare it to what scrapy sees on the same page.
@Niranjan Sagar 2015-12-01 01:58:10
There is nice middleware written by someone [1]: https://github.com/aivarsk/scrapy-proxies "Scrapy proxy middleware"
@pinkvoid 2015-11-18 07:58:32
As I've had trouble by setting the environment in /etc/environment, here is what I've put in my spider (Python):
@Andrea Ianni ௫ 2015-10-27 13:20:01
In Windows I put together a couple of previous answers and it worked. I simply did:
and then I launched my program:
where "dmzo" is the program name (I'm writing it because it's the one you find in a tutorial on internet, and if you're here you have probably started from the tutorial).
@laurent alsina 2013-01-18 14:58:29
that would be:
@Allan Ruin 2014-03-30 15:41:59
I use this yet I just received
[<twisted.python.failure.Failure <class 'twisted.web._newclient.ParseError'>>]
@Andrea Ianni ௫ 2015-10-27 15:26:04
In Windows: "set http_proxy=user:[email protected]:port"