April 10, 2015 BY futurefirewall
In today’s environment of cloud services and interconnected applications, it is no longer acceptable to build a slow service. Or rather, when you build an application that relies on a service, you need to program your application’s interface with that service to “fail fast.”
Many of SimpleWan’s recent projects have involved building real-time systems for processing millions of records, often obtaining or cross-referencing data on-the-fly from third parties. We could assume that a SimpleWan customer might tolerate a 3-second delay in the control panel, but this means our back-end systems need responses from third parties within one second; ideally within 100 milliseconds.
If a third-party’s system is down or suffering from network issues, or if our architecture simply isn’t optimized for the speed and size of data being requested, our back-end system may wait for a third-party response for multiple seconds — sometimes even multiple minutes, if timeouts aren’t configured correctly! By this point the customer, web browser, and probably even web server have given up on the request, so it’s pointless to keep waiting for a response. It’s much better to kill the request and log an error, thereby freeing up resources and giving us a chance to let the customer know what went wrong.
This means that every request, especially every third-party or off-server request, needs a timeout. Not just a timeout on the response, but a timeout on the connection or whole transaction, in order to prevent lockups during network incidents. And, that timeout should be low enough to still appear snappy to the customer, without being so low as to fail on minor hiccups.
But isn’t this common sense? Why bother writing a blog post about this? Apparently it isn’t common sense, because most of the open source libraries available for the third-party systems SimpleWan might use do not include timeout options. So, SimpleWan recently had the opportunity to contribute to a few open source projects, improving our customers’ experience and helping the open source community.
To actually fix these libraries, the source code of the third-party library needs to be examined, new parameters for Timeout and ConnectTimeout need to be created, and these parameters need to be passed to the core or similar connection routine. Ideally, these parameters will be added in a backwards-compatible way (in an options hash, or at the end of function calls as optional arguments). That should be all that’s necessary to get your application back on track; don’t forget to share your improvements back to the community