I’m starting a new series of posts in which I analyze a number of technical interview questions for programmers. The problems will vary quite a bit. I enjoy the puzzles and nature of these sorts of questions. I don’t give very many technical interviews anymore, so I figured I’d share some of my experiences and insights I’ve picked up over the years. As to what makes me qualified to talk about them, I don’t know that I am qualified to talk about them. I guess that’s part of the fun. I’ll explain the problem and walk through some of the solutions, but I’m more interested in the meta-game: What does this problem teach us about a candidate. What are the traps and pitfalls of the problem? What are the strengths of the problem? What are its weaknesses?
The point of this series, if I don’t just forget about it and move on to something else, is not to teach interviewing in any way, shape, or form. Rather, it’s to explain why I think your favorite interview question sucks.
A common “warm-up” interview question is the following:
You are given a singly linked list. Design an algorithm to detect if there is a cycle in the list.
Or, phrased by my nemesis, Evil Math Dude1:
Cycle … list.
What a Good Candidate Should Ask
Something I’ll probably repeat every time I dissect a problem: The first response from a candidate should be more questions; most of these technical interview questions are intentionally under-specified, and intended to lead the candidate down a journey of genius and discovery, where they gradually develop the optimal solution. In practice, it seldom works this way, but we’ll not discuss that yet.
- How many items will there be in the list?
- How much memory do I have available?
- Do we care more about speed or memory utilization?
But let’s assume we have a suboptimal candidate and/or an unhelpful interviewer (“well, let’s see what you come up with, and then I’ll tell you!”). What sort of responses do we get? There’s a lot of variety in responses, but I’ll try to cover some of them.
Finding a Cycle With Linear Storage
The first variety of solutions we’ll see involve using linear storage. These types of things happen if you don’t tell candidates ahead of time (or if they don’t ask) that their memory usage is relevant. Alternatively, if the candidate is told there are maybe a few million elements and they have a modern machine, these sorts of solutions probably should come up, since we’re talking tiny amounts of memory. As a rule of thumb for this series, if you can fit the memory requirements for a problem into memory on a moderate-priced consumer computer, the problem uses a trivial amount of memory. If a candidate starts reminding you of this, you probably have a hire on your hands; there aren’t many candidates that will argue that your problem is trivially small and requires an insubstantial amount of memory on a modern machine — if they’re aware enough to do so, you’re probably in good shape. In any event, here are a couple of linear solutions:
- Add a flag to each node in the list; each time you visit a node, set the flag; if the flag is set upon visiting a node, you have a cycle. The candidate should quickly be able to prove this is a linear algorithm, but that it requires two things: 1) Being able to modify the list (not an issue when you can store your own type inside the list) 2) a linear space increase.
- Each time you visit a node, store its address in a map or hash table; if the item already exists in the hash table, you have a cycle. This solution gets around the issue of needing to modify the type or structure of the list, but requires an auxiliary data structure with linear storage requirements. Depending on implementation, that gives us performance of O(n) or O(n lg n). Or, if you have a particularly daft candidate, they may just store the values in a list or array, and the algorithm becomes quadratic. Do them a kindness and send them home now.
Using Constant Space
The linear solution was easy, we now want to make an optimization to the algorithm to use constant space. If the candidate is dubious of the need for this, we can approach the protest by suggesting that we just have an API that spits out nodes all day long. Or that we’re trying to improve our implementation of Pollard’s rho algorithm. This last bit is a trap: if the candidate knows what you’re talking about, then they’re toying with you and pretending they don’t know the answer, when they do. Alternatively, if they smile and pretend like they know what you’re talking about you might have a different type of a problem. That’s a discussion for a different day. Here are some approaches.
- Keep a count of how deep you’ve traveled down the list; if it exceeds 10,000 (or any other arbitrary limit) without hitting null, you have a cycle! The unique appeal of this solution is that it runs in constant time and uses constant space. Being the optimal solution, I’m not going to discuss it further2.
- Keep a depth limit; traverse from head to the current depth limit, and then record the node at this point. Apply the algorithm again, increasing the depth limit by one; if at any time you see the node seen at the previous depth limit prior to reaching your new penultimate and ultimate node, you have a cycle. Gah. Okay. Yeah, it’s constant space, but this algorithm is quadratic and icky. This answer is only acceptable when accompanied with a disclaimer from the candidate. Valid disclaimers are like “Okay, I know this is a HORRIBLE SOLUTION, but it satisfies the requirement, but I’m going to do better now.” Or “if you want to laugh, here’s an idea…” If they feel satisfied with this approach or are smug, or appear to stop thinking at this point, it’s time to send them home.
- Use Floyd’s Algorithm: Leverage two pointers moving at different speeds (1x and 2x); if they are ever coincident, you have a cycle. Ah, the desired slam-dunk solution. Constant space and linear (2n) performance. Not always easy to come up with on the fly, given no prodding, at least in my experience, but a relatively easy leap to make if the interviewer gives the slight hint of “what if I only give you two pointers?”
Of course, the real jerk interviewer can at this point indicate that, for the average case, this solution is suboptimal, and demand you come up with the teleporting turtle algorithm on the fly. This interviewer can go fuck themselves. A valid question could still be “could you do better?” but I don’t think it’s reasonable to protest a “I don’t think so, but I’d be willing to give it a shot” at this point.
Analysis and Metagaming
How long should this question take? It depends, of course, on how hard you push, what you want to reach, and whether or not you make the candidate write code for their solution. I think the whole discussion of this problem, including code writing, should take no more than 15 minutes. 20 minutes is pushing it. If it takes longer than this, send the candidate home or at least move to a different problem.
The problem’s final solution requires a big leap; I think it may be a little too prone to the A-ha! factor. It can be teased out by suggesting constant space usage, a very small constant at that, and then leading to using only a pair of pointers at most. But, there aren’t really any intermediate opportunities, the candidate either gets it or they don’t; this sort of binary success/failure characteristic of the final solution is not very useful.
I don’t think questions should be asked that can’t be converted to “write code for your solution” in the end. This is a relatively easy solution to code up; some of the less optimal solutions are slightly harder to write (a nice reward for sucking), but the increased time to implement and debug them makes up for the decreased time to get to a useful solution. I’m preaching here about interviewing technique, but the candidate should be required to write code on the interview, and this is a great warm-up exercise before doing something a little more intense.
The problem appears to feature one of my favorite characteristics of a good interview question: There are a number of steps in reaching the optimal solution, and there’s a path or progression one can follow to get there. Unfortunately, this problem is vulnerable to quickly leaping past many of the steps, meaning we can lose a lot of the learnings we would get otherwise along the way. If we jump straight from marking nodes to dual pointers, we didn’t learn too much, honestly. The journey is a lot more interesting on these sorts of problems than solving the puzzle; seeing where the candidate explores, and what sort of ways they apply what they know is interesting. Seeing them spontaneously get it isn’t. I’m looking for things that a candidate reaches for from their toolbox to solve this; it gives me a sense of what’s in their toolbox (a little important) and gives me a sense of how well they can use it to solve a new problem (extremely important).
So, what does this question tell you, the interviewer? Well, it’s going to depend a bit on the journey. If we jump straight to the end, we don’t learn as much, as I indicated previously. There are opportunities to learn about data structure knowledge (lists, trees, hash tables, arrays, stacks), to check pointer comfort (list traversal), to discuss space/time trade-offs, to discuss algorithmic complexity and analysis, and to spot check some very basic code-writing skills. The last item can be improved slightly by having the candidate design both the structure of the list/node and the algorithm, but we’re talking maybe eight lines of very simple code, tops. We’re not going to get into any sophisticated coding, odds on seeing any recursion are pretty slight, and we’re not going to talk about any sort of interesting algorithms here.
- Good starter for a two-question interview
- Easy cut between algorithm design and coding portions of question
- Opportunities to check familiarity with basic data structures
- Opportunities to check familiarity with time/space/complexity
- Gradual approach of improving solutions possible
- Not a rare problem; easy to find on Internet lists of interview questions
- Easy to skip from suboptimal to optimal solutions without doing much in between
- A-ha factor in getting to the final solution; not much middle ground
I’ve thought a bit more about a reason I dislike this problem. It’s very easy to start with a solution that already has the optimal algorithmic performance. Even using some massive data structure that uses linear space, the solution is already linear. There’s no graduated improvement in algorithmic performance. All of the improvements come in the form of storage — there’s not really a space/efficiency tradeoff as much as there’s a space optimization. In practice, I think most problems one encounters in engineering don’t lend themselves to clever space optimizations like this, and instead are of the form “make a hash table.” Anyhow, there’s no intermediate O(n lg n) or O(n^2) solution, which is sort of a bummer. There’s (obviously) no sublinear solution.
1 “Evil Math Dude” was a character I encountered at an interview, many years ago. He had a PhD in mathematics, as he informed me when he entered the room, and I was worth exactly the shit he just deposited into the toilet on the way there. He didn’t actually say that last part, but it was clear he was thinking it. He whispered under his breath in a heavy Eastern European accent whenever he spoke, using only one out of five of the words necessary to complete a sentence. He would wave his hands for where the other words should appear, as if speaking them out loud was beneath him. He had a shadow, as well, who did not speak, but broke out laughing whenever Evil Math Dude smiled, which made Evil Math Dude stop smiling, which made The Laughing Shadow stop laughing. Evil Math Dude’s first problem was to “delete a node from a list given nothing but a pointer to that node.” The problem was phrased as “delete node.” Clarified, it became “Delete node from list.” I’m not the smartest guy around, but I’m pretty sure that the 20 minutes it took me just to figure out what he was asking were indicative of a problem that wasn’t a product of my (lack of) ability. The second question was “factorials.” After a few minutes we got it to be “zeros factorials.” So, the second 25 minutes of the interview was trying to figure out (and answer) this question, which turned out to be “devise an algorithm that tells how many trailing zeroes there are for a given factorial n.” So, whenever I refer to Evil Math Dude, imagine that guy, and know that his icy stare would make all but the strongest child cry.
2 At the risk of being called out on this, yes, this is a joke.