So we have gobs of applications that run as part of a process, and they have gobs of configuration files. These configuration files are written as XML. The problem is, certain segments of these XML documents don’t allow comments. The first time this happened, I said to myself “that’s a mistake, but mistakes happen.” The developer realized the error of their ways, but it was decided that the effort to resolve this issue was not worth it, so for now there’s a stanza of XML that we refer to as the “do not put comments in this stanza” section. Three months later, another developer on the same team released an application that crashed on a block of comments in the XML, because their parser tried to interpret the comment as a configuration section.
The solution? Hell if I know, but here are some ideas:
- Developers should share code. There’s no reason for a development team to have four different home-written/extended XML parsers (no joke). That that many exist indicate that they are not sharing code, and that at least three developers have wasted a huge chunk of time. Further, it almost guarantees that each one has a parser with an incomplete feature set and some inherent limitations and bugs; each of the four developers would have identified and prevented some of those issues. Admittedly, my argument is tempered by the fact that more cooks in the kitchen is likely to introduce additional problems of another nature. But, if they need additional functionality in a parser, why not subclass or modify the existing parser to do what is needed? I don’t get it.
- Developers should learn from mistakes made by their team members. After a blow-up in production (or a broken build, or a failure to pass a QA stage, etc.), there should be a post-mortem meeting. If nothing else, a post-mortem session should be built into whatever weekly/daily meetings are already set up to discuss progress, goals, new technologies, changes, etc. I’m not talking about a huge time waste, but a quick note that “yesterday process X detonated in production because it couldn’t handle comments in XML; make sure you handle comments in XML” would be a good start. This should be able to facilitate discussions on what sort of mistakes cause problems, how to prevent these mistakes, and how to think about building robust solutions in the future. Making mistakes once isn’t something to be ashamed of, but making the same mistake your team member
- Developers should think about ways to signal failure. In the above case, the application produced no logs because the logging configuration was stored in the same configuration file, and was not parsed. A separate logging configuration could be used, but this could also be screwed up. Some other ideas may be falling back to default logging settings, returning an exit code indicative of failure, or some similar mechanism.