Aaron N. Tubbs bio photo

Aaron N. Tubbs

Dragon chaser.

Twitter Facebook Google+ LinkedIn Github

PA5

As of this evening, I’ve received my first cppgm ALL TESTS PASSED of PA1-PA5. There’s a lot of refactoring and cleanup needed, but I’m in the home stretch and ready to start PA6 and PAA soon. I’d say I’m excited for what’s next, but I’m intimidated as well, since “we are getting to the hardest part of the course.”

At least for me, PA5 was a bitch. I put about 15 hours into PA5 before it would pass the most basic tests in the suite. I pushed those changes back to PA1-4, at which point it was clear that I’d severely broken every single previous assignment in the process.

PA5, thus, did a fantastic job of finding faulty assumptions I’d made and pointing out why they were broken. There are many examples, but there were three basic categories.

The first batch of assumptions I made was that I could buffer without fear. This happened at many different scales: two characters (newline guarantees at EOF), lines (token sequences), blocks of lines (text sequences), and translation units (preprocessing). In PA5, every single time I buffered, I had to either add additional controlling logic, or abandon the buffering.

The second variety of assumptions revolved around the notion of an EndOfFile token and abusing/assuming what an EndOfFile token really could mean. While there are many facets to this, most of them relate to the distinction between ending a file and ending a translation unit. Numerous pipeline stages assumed they would be run through one “file” and would then be complete. I added a lot of code to ensure each state machine completed and that it would explode if molested thereafter. I made no provisions for mechanisms to revert to a start state, or better yet for them to be able to be re-invoked numerous times in sequence.

Finally, I totally failed in my design of interfaces and communication. This is difficult to explain succinctly. Pretty much any chunk of the preprocessing/lexing pipeline needs to be available for re-use from anywhere else in the prprocessing/lexing pipeline. And this has to be possible with all of the necessary state either easy to create, or easy to loop/re-form & propagate. And it has to be necessary to seamlessly loop the output of this back into whatever it was we were doing at the time. Violent examples of this include pipeline propagation of file/line information and recursive rescanning and retokenization after macro invocation.

I wish I could say that in producing a passing preprocessor/lexer that my code is now beautiful. I wish I could say that I’m happy with how its architecture and communication is set up. Both statements would be outright lies, unfortunately. That I am embarrassed of the mess I’ve created already is an understatement. Most things were fixed by bolting on more horrible logic or augmenting an already bad design. Each terrible design choice I’ve made is amplified by each subsequent stage: the taxes are far from linear.

On the upside, I have a better sense of what I would think about were I to start again from scratch. On the downside, I have no picture yet in my mind of how to solve all of these problems elegantly. Compilers are nasty, complicated, and difficult. Entertaining the idea that the first compiler I would write would be a standard-compliant self-hosting C++ compiler was beyond foolish.

On the other hand, I’ve not flunked out yet. So I continue.