**Prologue**

I have a lot of interview questions that are variations on a theme. Start, for example, with a question like “remove all of the x’s from a string and increment all of the numbers.” This is pretty straightforward, but there should probably be some questions about 9.

Change it a bit. Make it double all of the spaces. Make a `q`

remove the character two characters previous in the string. Very quickly we transform a simple string manipulation question into the problem of writing an interpreter. Soon we develop an instruction set, and our little toy to manipulate strings is in fact a language, birthed in the privacy and comfort of the technical interview.

Each such problem is usually unique, because I make them up on the fly. But, at the end of the day, they are unique problems that a candidate has not *precisely* seen before, even though they may well have solved a very similar problem. This is key. It insulates against the risk of “I’ve seen that problem before” but allows for “I’ve seen something like that before.” Seeing a candidate adapt their knowledge to a slightly different problem is a huge win for the candidate and the interviewer. Preparing a derivative problem like this is not a lot of work, but it’s important to practice and try things out, just the same.

Creating an interview from scratch during an interview is a pretty dangerous endeavor. Creating a variation, if done carefully, is probably a good way to adapt a class of problems to address your particular concerns about a candidate. This stuff takes practice, and it’s more art than science.

**Ignoring That Nugget of Advice**

I’m about the least creative or imaginative person on the planet. I’ve come up with only one interview question from scratch, ever. I discovered the problem while writing some code to pick functions in a fractal flame rendering system. Today we’re going to pick it apart; please find all of the holes and issues with it. It’s not perfect, and I’m convinced it must have problems, but as the originator^{1}, I’m blind to the flaws for obvious reasons. It goes a little something like this.

**Probabilistic Dispatch**

We’re trying to implement the randomization core of a Monte Carlo^{2} system. We are given a table of probabilities and a table of function pointers, like so:

probability function ----------- -------- 0.1 f1() 0.2 f2() 0.1 f3() 0.3 f4() 0.3 f5()

This is just an example, in reality we will have `n`

probabilities and `n`

functions. Design an algorithm and the associated data structures to pick a function from the table given the probability distribution. That is, we want to end up with some function `f()`

. If we call f(), we expect that 10% of the time f_{1}() will be called. 20% of the time f_{2}() will be called, 30% of the time f_{5}() will be called. And so forth.

If the notion of function pointers makes the candidate wet themselves, we can then rephrase the problem as something like “I have a table of integers; I want to write a function that will pick one of them at random given a matching table of probabilities.” It’s easy to simplify the problem down to the core algorithmic/data structure aspect and throw out the flavor text.

As clarifications, I provide the following:

- All probabilities are provided to you as 32-bit IEEE floats. I make sure to say this part twice, in one form or another.
- The sum of all probabilities is 1. If questions come up about things “really” adding to 1 and actually checking 1.0 and the other things a smart person might think about with regards to floating point, we can simplify and say that there’s a function EQ that will be guaranteed to evaluate (EQ sum_p 1.0) as #t, subject to some epsilon for our particular sums of probabilities. I’ve never actually had anybody ask about this, though.
- You are allowed any reasonable amount of processing before somebody calls
`f()`

. For example, we can add probability/function pairs to your class/data structures via some function like`addfunc(pn, fn)`

and you can be notified by some function like`alldone()`

if it makes it easier. These functions don’t need to be written, but the key is that we can do anything we want ahead of time to set things up. - You are given a magical source of randomness that is very fast and very random. It can return you a 32-bit unsigned random integer, or it can give you a floating point value in [0, 1].
- To keep things easy, assume
`f()`

returns void and all of the function pointers have the signature`void fn(void)`

. If we’ve simplified to the “return an integer from an array” thing we just need to return`A[n]`

and we’ll assume the numbers are all of type int or whatever. Easy stuff. - The goal is to optimize the performance of calls to
`f()`

; we’re going to call`f()`

a lot of times!

Given all that, write `f()`

. You can either do it like a functor, or just write `f`

, and keep the data structures external. Doesn’t really matter to me.

**Initial Reactions**

A disturbing portion of the population that encounters this problem quickly comes to the following solution:

“I’ll just put the probabilities in a hash table, take a random number, and then look up the function and call it.”

This usually calls for the Socratic method: “Sounds easy enough, why don’t you run me through an example!”

Sometimes they write code, sometimes they start diagramming things, sometimes they stare off into space for a few seconds, but the next step usually comes quickly. They don’t have a hash function in their toolkit that performs the transformation from continuous data to discrete data. Some folks take a really long time to come to this conclusion, which scares me a little bit for a number of reasons. Basically, they recognize a sophisticated data structure (a hash table), but don’t recognize a situation where it might not be suitable out of the box. What happens when they take their box of tools and get let loose in hundreds of thousands of lines of source code? It’s just a mild red flag, but something to watch out for. I think the biggest vulnerability here comes from people that use sophisticated container/collection classes but doesn’t understand any of the guts about the underlying implementation. Discussions of maps versus hash tables in other parts of the technical interview can be used to clear (or confirm) this fear.

Anyhow, it doesn’t take long to realize the initial problem is pretty simple; we need to take several continuous ranges and convert them to discrete outputs. Here’s a sketch of one solution:

// p[] holds p[0] to p[n-1], representing probs // f[] holds the corresponding function pointers void f() { float r = rand_uniform_float() - p[0]; // [0, 1] int i(0); while (r > 0) r -= p[i++]; (*f[i-1])(); }

Neat. Probably close enough for the sake of an interview; there are a couple of boundary issues and goofy things (like probabilities of zero) that probably need some more thought, but I’m not concerned about the finer points on this question; if the candidate brings it up I’m *delighted*, but this is primarily a data structures and algorithm question. What’s the running time? The answer better be “n” or “linear.” A decent number of people toss in something like “and that’s good enough” or “and that seems pretty fast.” This always surprises me, but doesn’t really disappoint me.

A very similar approach that requires slightly more preprocessing is to compute the cumulative probability table such that each row of the table has the sum of the entries to that point. The idea being that we can now write an algorithm like:

float r = rand_uniform_float(); // [0, 1] for (int i(0); i < n; ++i) { if (r <= pp[i]) { (*f[i])(); return; } }

No real improvement there, though it simplifies things a bit. Maybe it saves us a floating point operation per iteration. Whatever.

**Non-Algorithmic Optimizations**

I usually prefer a candidate to admit the above solution is suboptimal, but I’m on board with the idea of getting a “working and slow” solution out before trying for “elegant and incomplete.” So, again, my hope is the candidate starts searching for ways to make it faster, or is bothered by it being a linear operation. It’s not a huge deal if they aren’t. We just do something like:

“Nice work. We ship this to the client and everything works great for a few weeks of testing. Unfortunately, their model grows from a half dozen or so functions to a hundred thousand different functions. How do we make it faster?”

I like people who spot real-world improvements, because there are a lot of those. There is often a lot of low-hanging fruit for performance optimization, and some of it is not from the camp of computer science. Real world optimizations matter to me too!

It’s simple. All we need to do is sort the probability distribution (and the related function pointers) in order of decreasing probability, and then we’re more likely to get an answer quicker if we pick one of the higher-probability options. This is a nice point to realize; some people do it initially without even starting with the initial unsorted approach.

Were I cruel, I’d ask them to write the code to do a dependent sort of two arrays like that, but it’s more a matter of a thought exercise and/or a syntax thing, so I wouldn’t really bother. I’m okay with hand-waving at this point that `f`

and `p`

(or `pp`

as in the above example) are magically sorted.

**Cheap Algorithmic Improvements**

The next leap is an algorithmic one, however. It’s more obvious if the candidate took the second approach than the first one, so it may take some prodding. Here’s the new algorithm:

float r = rand_uniform_float(); // [0, 1] (*f[lower_bound(pp, pp + n, r) - pp])();

And thus we went from a linear to a logarithmic solution by doing a binary search. One option would just be to make the candidate write a binary search instead of using a library call, if we don’t worry about going any further. Readers of Bentley’s *Programming Pearls* will realize the trap of this: Odds are we’re going to need the entire technical interview to implement a binary search through an array correctly. Go buy the book and read it, if your first reaction is “oh, whatever, binary search is easy.”

Anything more algorithmically complicated than a binary search is probably a dangerous choice for a technical interview, if the goal is to get it 100% right. Remember this, and decide how you want to spend the interview — getting sophisticated code to 100% in an interview is not an effective use of time (but doing some debugging out loud to get to 80% can be really valuable). In the real world we have compilers, unit tests, debuggers, etc.

**Thinking Outside the Box**

Most candidates still have an instinct at this point that logarithmic performance can be beat. Often this is still in the form of “somehow this can be coaxed into a hash table via some clever hash function.” I’m pretty sure finding this clever hash function is tantamount to solving the problem, but I may be missing an obvious solution.

There are some marginal improvements often proposed such as bucketing search regions to accelerate search. These ideas are a distraction, but occasionally they inspire some of the stuff that follows, so I don’t discourage too much.

In any event, this is my favorite part of the problem, because it’s hard. What I like are some of the creative ways to flip/break the problem. Here are a few of them.

We can investigate the idea of pre-computing the distribution. Let’s say we know that our simulation is going to make 10,000 calls to f() and that the invocations of the interaction and ordering of function calls don’t matter. That is, if we call f_{1}, f_{2}, f_{3} or f_{2}, f_{1}, f_{3}, we expect the same net results for all permutations. Our problem becomes rather trivial in this case; we call p_{1} * 10000 invocations of f_{1}, and so forth. I love this answer, because it’s finding a creative way to solve a performance problem by beating the problem. I still want to keep pushing though, so I mention that order does matter, and that we gain confidence in the system precisely because of its randomness and unrepeatable results. Shuffling which probabilities get applied first via a `random_shuffle`

sort of addresses this, but it’s easy enough to say call order matters, or that internal state of the function calls changes between calls, and that things are dependent.

Back to square one. What we really want is constant time performance, dispatched “randomly.” Let’s break the problem a different way. What if we accept that our incoming precision is 32-bit IEEE floating point, but we restrict our modeled precision to some `k`

, like 10,000. We pre-compute a dispatch table for which the first p_{1} * 10000 entries are f_{1}, and so forth. Then we grab and modulus a random integer for dispatch into our table of k function pointers, or scale our uniform distribution by k, and off we go to the forum. Constant dispatch. The only problem is we’re restricting our precision. Will the user care? Good question. Good thinking.

One way to tackle this aspect is to normalize for required integer precision. Figure out the smallest precision required to accurately model the provided probability distribution, and pre-compute a table of that size. If we have probabilities like 0.5, 0.25, 0.05, 0.1, we can just make our table size 1/0.05 and we’re done. This could result in some pretty big tables depending on what is picked, as we have to model to the lowest common unit of precision (or to be on the safe side, just pick the lowest changed digit and work in tenths of that digit).

See why this problem is fun? All sorts of opportunities to talk about floating point precision, math, and random crap in a real-world environment.

And now we randomly realize we already have the answer. “Aha!” says the interview candidate, the interviewer is really asking me to consider a question of performance/space trade-offs. Let’s just make a table for every single floating point number in [0, 1]; with that we can precisely model the input conditions for the problem, with the only tricky bit being how much memory we use. How much space do we need? Well, I don’t expect anybody to know how many 32-bit floating point numbers are between 0 and 1. A good answer is “more than 0” and “fewer than 0xffffffff.” A fun exercise at this point would be “write a program that tells you how many there are!” Here’s a really sloppy one that avoids doing any math or thinking about how IEEE floats work, and probably doesn’t work quite right, but it’s probably close enough for a first stab:

for (uint32_t i(0); i < 0xffffffff; ++i) { float f(*((float*)&i)); if (f >= 0.0f && f <= 1.0f) count++; }

The answer? About 2**30. Of course, that’s numbers between 0 and 1, not necessarily distinct values. Again, flawed approach, but we’re already down to a gigabyte of address space instead of four gigabytes!

But let’s say we don’t know that. If we have a big machine with “many” gigabytes of memory, can we load a table like this without worrying about swapping and paging? Well, again, worst case we’re talking about 1<<32. Is that worth constant-time performance? Good question! I don’t know, but having the options is fun, because we now have a range of possibilities and trade-offs that we can adapt to the requirements, which is pretty much how the real world works. Is it *really* constant time performance? Well, the reality of memory is “probably not” but that’s a good discussion too. Win-win!

**Closing Thoughts**

There are some issues with the problem. There’s a little bit of ramp-up in explaining the problem, but I don’t think it’s unreasonable. With practice the problem explains in under two minutes. Faster solutions are probably going to be less interesting in code (a recurring theme in the series). I think it has some interesting opportunities for discussing floating points numbers, time/space trade-offs, and ways of improving performance without making algorithmic leaps. It doesn’t really let us get into data structures more interesting than arrays and hash tables, but that’s probably okay. Well, it might get us into trees, but we hope that doesn’t happen.

But, then, it’s my question, so I’m not a fair judge. Tear it apart. What’s wrong with it? What are the traps? Is there a good solution I’ve missed?

^{1} I would not be surprised to learn that a similar question has been asked before; while I haven’t encountered it, I don’t claim that my problem is unique or original in a scientific sense. If somebody has seen something similar before, I’d love to see other discussions/expositions of it on the Internet.

^{2} Sometimes people admit they don’t know what a Monte Carlo system is, sometimes it’s clear they don’t know, but they don’t say anything. This doesn’t bother me, because it’s color text. It’s not actually relevant to the problem, but I’m happy to explain the “throw darts at a circle in a square to measure pi” example, if they ask.