How to run an election on your computer

A statistical model of ranked choice voting

Jun 24, 2021

Last weekend I participated in my first ever ranked choice vote, for the Democratic party candidate for NYC mayor. The New York Times has a great explainer, so I won’t retread those same waters and try to explain how it works. This post will be how to build a statistical model of the election, as a follow up to my post about what a statistical model even is. Here, we’ll build one together.

Let’s start with the candidates. Right now Eric Adams is in the lead, with Maya Wiley and Kathryn Garcia trailing, and Andrew Yang in fourth place. Since Andrew Yang has conceded, and three candidates is all we need to show how ranked choice voting works, we’ll keep this simulation to just the top three.

Let’s start at the end of the line. We know we need to simulate ranked votes. Once we have these ranked votes, we can run it through the operations that determine the winner. Every voter in our little simulation is going to order the candidates (Eric Adams, Maya Wiley, and Kathryn Garcia) in some order, determined by their preferences.

So what do we do now? Almost every kind of data that you can observe has some kind of probability distribution associated with it. Data that represents counting how many times something has happened, like, say, standing on a street corner and counting the number of cars that drive by? That could be represented by a poisson distribution. Data that represents how long it takes for something to happen, like how long you’ll have to wait for a subway? That could be represented by an exponential distribution. Binary variables? Bernoulli distribution. And so on: almost every kind of phenomena has some set of probability distributions that are associated with it. So the first step in any statistical model is to figure out what kind of data you’re observing, and pick out a probability distribution that represents that kind of data well.

In this case, we’re modeling ranked choices. So we see the order of preferences, but not the size of the preferences. So we’ll see, say, that someone places Eric Adams first and Kathryn Garcia second, but we don’t know if that was a slight preference or a very large one.

This is actually quite a complex outcome; originally I was going to use the ordinal logistic distribution, because I thought “we have ordered data, let’s use an ordered distribution”. But ordinal logistic is good for data along a single scale, like rating customer service from very poor to very good, but not for ranking multiple scales against each other.

To simulate the rank, we’re going to have to do something a bit more complex, and we’re going to do it in two steps. The first step will be to model the first choice, which will be choosing one candidate out of three. The second step will be to model the second choice, which will be choosing one out of the remaining two. And the third choice will simply be the remaining candidate.

To model a choice between more than two options, the standard distribution we typically use is called the multinomial logistic distribution. It takes in utilities for each choice, and then does a fancy function to compute the probability of any given choice.

choice_1 ~ multinomial_logistic(utilities[everyone])

choice_2 ~ multinomial_logistic(utilities[everyone except choice_1])

Okay so now we have a start to that “list of relationships” I was talking about. But I need to explain two things here. The first is the ~ sign. This means “distributed as”, which means that we judge how likely the data on the left is according to the distribution on the right. This can be used either to simulate data (more likely data gets simulated more often) or to see how well the values on the right fit with the data (the data we observe should be likely).

The second thing to explain are the utilities. These are numbers that represent how voters feel about the candidates, that we can think of as “how happy would you be if this candidate were elected”. Since it’s possible that you can be just a little more happy, or just a little less, we will represent this with a continuous value. And since this is an election, after all, we’re going to assume that, in general, people aren’t going to feel either super happy or super disappointed about any candidate. This is NYC; we’re all jaded anyway. A normal distribution (the classic bell curve) fits this well. So we can then say:

utilities[candidate] ~ normal(0, preference_strength)

So when I said we can assume that people won’t be either super happy or super disappointed, I lied a tiny bit. Actually that would be something we would care about modeling. Is this a toss-up between equivalent candidates? Or do voters strongly prefer one candidate over another. This “preference_strength” number puts a value on how spread out we expect the preferences to be. This number has to be greater than zero. If in general we expect that folks don’t have strong preferences, but that sometimes they will, an exponential distribution has the right shape for what we’re looking for:

preference_strength ~ exponential(1)

We have reached the end of the road for this simple simulation — there aren’t any more numbers we have to think about simulating. So now we can go about the business of writing the simulation and seeing what happens.

So here is a simple R function that will run this simulation.

run_simulation = function() { 
  
  n_voters = 1000
  preference_strength = rexp(1)
  utilities = replicate(n_voters, {
    c(
      maya_wiley = rnorm(1, 0, preference_strength),
      kathryn_garcia = rnorm(1, 0, preference_strength),
      eric_adams = rnorm(1, 0, preference_strength)
      
    )
  })
  
  first_choices = c()
  for(i in 1:n_voters) {
    first_choices[i] = sample(
      rownames(utilities), 
      1, 
      prob = exp(utilities[, i])/sum(exp(utilities[, i])))
  }
  
  second_choices = c()
  for(i in 1:n_voters) {
    remaining_candidates = setdiff(rownames(utilities), first_choices[i])
    second_choices[i] = sample(
      remaining_candidates, 
      1, 
      prob = exp(utilities[remaining_candidates, i])/sum(exp(utilities[remaining_candidates, i]))
    )
  }
  
  return(data.frame(first = first_choices, second = second_choices))
    
}

The outcome of running this just one time?

  first          second         count
  <chr>          <chr>          <int>
1 eric_adams     kathryn_garcia   154
2 eric_adams     maya_wiley       188
3 kathryn_garcia eric_adams       168
4 kathryn_garcia maya_wiley       168
5 maya_wiley     eric_adams       154
6 maya_wiley     kathryn_garcia   168

As we expect, there’s no real difference in outcome. We haven’t done anything interesting yet. What you would want to do from here is to start to add in layers of complexity. Maybe you would introduce some dependence of the utilities on things like the ethnicity or location of the voter.

That would look like changing the utilities line above to:

utilities[candidate] ~ 
  location_averages[candidate, loc] + 
  normal(0, preference_strength)

location_averages[candidate, loc] ~ normal(0, location_scale)

location_scale ~ exponential(1)

What that line means is that each location (say, a neighborhood) has an average preference for each candidate, the location averages themselves have a range whose width is determined by location_scale, which is, here, usually less than 1.

Let’s keep things simpler here and just introduce some correlation between preferences of the candidates, as well as increasing the preference scale by 5x. We’ll say that people who like Maya Wiley are likely to like Kathryn Garcia and vice versa, but not care for Eric Adams. And those who like Eric Adams are less likely to like Maya Wiley and Kathryn Garcia. Here’s our correlation matrix of preferences:

              maya_wiley kathryn_garcia eric_adams
maya_wiley            1.0            0.9       -0.9
kathryn_garcia        0.9            1.0       -0.9
eric_adams           -0.9           -0.9        1.0

What happens now?

Well, the average utilities are the same:

But the correlations are quite strong:

Okay but what happens in the election? The answer may surprise you. I thought that Maya Wiley and Kathryn Garcia would perform much better than Eric Adams, since they would clean up the respective second votes of the other. While that does happen, what also happens is that Eric Adams ends up getting many more first place votes.

  first          second         count
  <chr>          <chr>          <int>
1 eric_adams     kathryn_garcia   219
2 eric_adams     maya_wiley       240
3 kathryn_garcia eric_adams        41
4 kathryn_garcia maya_wiley       248
5 maya_wiley     eric_adams        32
6 maya_wiley     kathryn_garcia   220

Essentially what’s going on is that the strong correlation between the two candidates collapses their differences, and it becomes a two horse race, 50/50 between Eric Adams on the one hand and Maya Wiley/Kathryn Garcia on the other. It’s 50/50 that you’ll have a positive utility for Eric Adams; and if you’re positive on Adams, it means you’re likely negative on both Maya Wiley and Kathryn Garcia.

Now, this last part actually has nothing to do with ranked choice voting. It’s more an example of the weird and non obvious things that can start to happen when you introduce more structure into your model.

So that’s how you can run an election on your computer. Starting with the basic structure of the election and adding in more complexity one step at a time. If I had been tasked with predicting the election results, this is how I would have started.

Applied Inference

How to run an election on your computer

A statistical model of ranked choice voting