Aaron N. Tubbs bio photo

Aaron N. Tubbs

Dragon chaser.

Twitter Facebook Google+ LinkedIn Github

At least until I flunk out, I’ve decided to start doing the C++ Grandmaster Certification. I’m under no illusion that I’m particularly good with or knowledgeable of C++; at best my capabilities in the language are barely functional. I’m also far from being prepared for this in an academic sense. Aside from the top-tier data structures & algorithms class, languages & compilers class was far and away the most useful and relevant coursework. That I only took a semester in the topic is criminal in retrospect.

I expect this to be difficult and certainly way out of my league. On the upside, I’ll be able to say “I’ve played with a few domain specific languages and once failed to implement a self-hosting C++ compiler.”

Thus far I’ve completed the first assignment. To be precise, my code works to the best of my knowledge and passes all the course-provided tests, but it has not yet gone through grading. I’m not particularly happy with the implementation; it has a couple of nasty kludges and feels really bloated and inelegant. I ended up rewriting everything up to the preprocessing token stage once because I wasn’t even sure how I wanted to do that until I’d screwed it up once. I got about 25% of the way through writing the preprocessing token code before I realized I was trying to be too clever, and ended up doing the obvious thing instead.

It’s been fun so far. The majority of my time was spent building and debugging this:

DFA

Probably 50% of the debugging time came before I wrote any code, and the rest as I wrote the implementation behind it. So, that was interesting.

I’ve already learned a few things:

  • Writing a UTF-8 encoder and decoder is pretty easy; it was a joy to do this straight from RFC3629. This exercise goes a long way to developing at least a basic mechanical familiarity with the idea of unicode and different encodings.
  • Thus far, the C++ standard is a bit impenetrable at first, but leaves very little room for ambiguity. At each step where I was confused, it just required more careful reading of the standard. Several times bugs in my program were fixed just by becoming more standards-compliant. An illustrative example was a nasty bug in my raw string literal DFA. I was pretty sure fixing it was going to require a pretty substantial rethink, until I read the standard and it provided the guarantee that, more or less, anything that looks like a raw string is a raw string (§2.5.3). The lesson here, I think, is to stop trying to be clever. Trust the spec.
  • There’s some joy working from a frozen spec. I would not enjoy this exercise were the details still being sorted out to a large degree.
  • There are a lot of language aspects that make a lot more sense and/or I previously took for granted. I expect this effect will increase greatly with further exercises.

The downsides so far have been more logistic in nature. I was originally working successfully (including running the reference implementation) on my Debian box, but then a git pull brought a new reference implementation into play that had dependencies on a modern glibc. I briefly ran a virtualized Ubuntu 12.10 image, but performance was pretty horrible. So, I’ve been playing recently with an Intel NUC, which has been pretty swell. With a fast (albeit absurdly small) SSD and 16GB of RAM, this delivers a pretty capable little package. My only real frustration is that I have to pick between gigabit ethernet and Thunderbolt. I would have preferred to have both. The i3 doesn’t sound like much on paper, but in practice it’s nearly as fast per core as my Xeon X3430-based Debian box (albeit with half as many cores). A couple generations of architectural improvements definitely make a difference here. Anyway, since that’s running the OS the course wants, things work extremely smoothly.

I remain curious which company is behind this effort, and am interested in seeing if they ever announce it.

So, fun so far. Looking forward to the next lesson.