Wednesday, August 01, 2007

Dealing with Errno::EBADF in ruby net/http

In August 2006 I posted a question in a forum about an error (Bad File Descriptor) I was getting when using the net/http library of Ruby. Today, almost one year latter, I received an email from someone with that same problem asking for my advice and a few hours latter the same person wrote again saying he found a satisfactory solution and wanted to share it with me.

As part of this wonderful online community I am obliged to share my new knowledge in hopes it is useful to others as it was useful for me.

The Bad File Descriptor error (Errno::EBADF) occurred sporadically while using the net/http library to connect to a lot of pages (web spider) in a short time span. The main problem was that I could never catch that error (i.e. rescue it) and the script would not finish leaving a lot of pages without processing. To solve this problem at the time I split my script is several smaller ones and added a small delay between web pages.

My solution works but that small delay for a thousand pages add up and the scripts take not minutes but hours to finish.

The email I got explained that the cause of this error is that the operating system (Win XP) is running out of TCP ports for new connections. Many of the sockets open are put in TIME_WAIT state (meaning that the client has closed but the server has yet to close from its side).

The first approach to solve this is to force Ruby to close the connection but there is no such facility, at least in Ruby 1.8.x, for doing it. The second solution and the one I received by mail was to increase the upper range of dynamically allocated to client TCP/IP connections to a value.

Here are the instruction on how to do it:
http://msdn2.microsoft.com/en-us/library/Aa560610.aspx

If you expect to make a lot of http connections fast using Ruby in a Windows XP machine then you better increase that number of ports or you will be around asking yourself what this random EBADF error is all about.