Aaron N. Tubbs bio photo

Aaron N. Tubbs

Dragon chaser.

Twitter Facebook Google+ LinkedIn Github

Let’s play a game. I feel like writing an application in Java. Hell, I’ll write a news aggregator, because there aren’t already six dozen half-assed solutions out there. After I’ve built up some configuration management functions, a way to cache some data out to a file, and some other standard routines, I decide it’s time to fetch data from the web. Java makes this really easy, so I just start hacking away (removing all comments, imports, logging, and the like, for brevity) … (apologies for the indenting, but that saves some horizontal as well … well, the real reason is textile seems to not do what I expect it would with pre and code tags; am going to have to re-examine that):

public class WebRetriever { private Properties config; private static WebRetriever instance;

private WebRetriever()
{
}

public static synchronized WebRetriever getInstance()
{
if (null == instance)
instance = new WebRetriever();
return instance
}

public static synchronized String getURL(String inURL)
{
try
{
int readCharacter;
String buffer = "";
URL url = new URL;
URLConnection urlC = url.openConnection();
InputStream is = url.openStream();
char data;
while (-1 != (readCharacter = is.read()))
buffer += (char) readCharacter;
is.close();
return buffer;
}
catch (MalformedURLException e)
{
// …
return null;
}
catch (IOException e)
{
// …
return null;
}
}
}

Look, mommy, I built a Singleton! It fetches data from the web! It is not efficient, elegant, or anything pleasant to look at, and it should probably just feed back a raw inputstream rather than monkeying around with native strings, but it uses a pattern, so it must be cool. Now we go on to hack up a whole bunch of stuff with XSL, make some pretty transformations for RSS and ATOM files, and release YAAYA 0.1 (which, seven years in the future, will reach release 0.2.3).

My point here is that it seems the vast majority of quick, dirty, and cool applications that are released into the wild have very little thought into the not-so-simple subject of data retrieval from the Internet. Countless times I see news aggregators, stock tickers, weather bugs, and the sort released into the wild with no more though than this. For applications that are running inside of or using the settings of a web browser, this isn’t a particularly big deal — it doesn’t take much effort to bootstrap the already functioning browser infrastructure. However, for stand-alone applications, the fact that about this much thought is put into the topic makes me sad.

Here’s where the problem steps in. First off, the user could be behind a proxy. Depending on how cool the language/library that you are using is, it may be able to automatically pick this up, providing you have set the appropriate environment variables, system settings, or orientation of colored artifacts on your desk. A good example of this is people who have implemented data retrieval using libwww-perl; you can set enough environment variables in your shell to allow your perl program to pick up, use, and authenticate to a proxy behind the scenes. Intuitive? Hardly … but it is functional.

Unfortunately, such easy fixes are not always possible. Unless I miss something, with Java, there is no way to make the above code work behind a proxy automatically, especially if we’re talking about a HTTP proxy that also requires authentication. Boo-Hoo. Of course, with a few small changes, suddenly this magic works. Again, for brevity, I’m going to pull values out of thin air, rather than dragging them out of a configuration object.

public class WebRetriever { // ...

private static class WebAuthenticator extends Authenticator
{
if (CONFIG_PROXY_ENABLED)
{
String user = CONFIG_USER;
String pass = CONFIG_PASS;;
return new PasswordAuthentication(user, pass.toCharArray());
}
else
{
// die violently
}
}

// …

private WebRetriever()
{
Authenticator.setDefault(new WebAuthenticator());
Properties prop = System.getProperties;
if (CONFIG_SAYS_USE_PROXY)
{
prop.put(“http.proxyHost”, CONFIG_HOST);
prop.put(“http.proxyPort”, CONFIG_PORT);
}
}

// …
}

Magic, a few lines of code, and now we can configure to use a proxy, even of the authenticated variety. In my experience, most small projects do not bothers to do this, despite the fact that a lot of people are behind proxies these days. I know, I know … these people are releasing their code into the wild out of the kindness of their heart, and if we really need that sort of functionality we can build it in ourselves (more of a bitch when they release free but compiled versions of their code). That said, I feel like it’s somewhat irresponsible to release some sort of supposed complete application that disregards something this basic.

The way I see it, you should either: a) Develop with full consideration of the potential user audience, and with a lot of thought into the environmental differences that could occur b) develop applications in a language/technology that foregoes your need to worry about these things or c) Use other technologies to handle the commodity bits. For small-time developers and hackers, the second solution makes a lot of sense to me, especially if you are planning on writing another abandonware stock ticker or something similar. The third solution is a little subtle, but in the context of this example, it would mean that instead of implementing web retrieval code at all, you would just use an external tool that is already best of breed for that sort of thing — such as having a configuration that allows you to specify curl or wget.