Have you ever seen this nasty, obnoxious error when using the Mechanize gem to write a screen scraper in Ruby?
1
|
|
This has plagued Mechanize users for years, and it’s never been properly fixed. There are a lot of voodoo suggestions and incantations rumored to address this, but none of them seem to really work. You can read all about it on Mechanize Issue #123.
I believe the root cause is how the underlying Net::HTTP handles reusing persistent connections after a POST – and there is some evidence on the aforementioned github issue that supports this theory. Based on that assumption, I crafted a solution that has been working 100% of the time for me in production for a few months now.
The Workaround
This is not really a fix for Mechanize or Net::HTTP::Persistent, and there are sure to be some corner cases where you legitimately want this error to be bubbled up, but in practice, I have found that simply handling a persistent connection being reset with the “too many connection resets” error, forcing the connection to be shutdown and recreated, and simply trying again has worked 100% of the time in high-volume production for scrapers that suffered this problem intermittently.
This is done by creating a wrapper for Mechanize::HTTP::Agent#fetch
, the low level HTTP request method that is used to do GETs, PUTs, POSTs, HEADs, etc. This wrapper catches this annoying little exception, and uses the shutdown
method to effectively create a new HTTP connection, and then tries the fetch
again.
Loading the following monkey-patch somewhere in your application ought to shutup this annoying error for you for most use cases:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|