An Explanation of Estimating Vote Counts on

Adam J. O’Neil

July 26, 2018

The ﬁrst rule of reddit: never acknowledge that you are a redditor to anyone in

real life. Ever.

Here’s a fun mathematical experiment for the day: on the social network reddit,

users can decide which stories gain traction by upvoting or downvoting posts. An

upvote counts as +1 vote, a downvote -1 vote, and no action at all indicates a net

change of 0. As recently as a few years ago, reddit would show the total upvotes

and downvotes, as well as a percentage indicating the ratio of these numbers to

the total of votes. They removed that a while ago, but they kept two important

numbers that we could use to calculate a close estimate of the upvote/downvote

totals, as well as the total number of votes overall.

These two numbers are the net votes and percentage of users that voted upvote,

which are displayed for every post. The net votes here would be the upvotes +

downvotes. The percentage of upvotes (percent liked) would therefore be the ratio

of upvotes divided by (upvotes + downvotes). We can use these two equations to set

up a system of equations to solve for each individual variable. In addition, I will

add a third equation that ties these two variables to the total vote count, which is

upvotes + downvotes = total votes.

Equations

First, let V

= total votes, V

= net votes, P = percent liked (as a probability, not

percent), U = number of users that upvoted, D = number of users that downvoted.

will be an integer, P will be a probability greater than 0.5 and less than or equal

to 1. We only know P and V

at the moment. Here is the foundation:

U − D = V

(1)

U + D

= P (2)

U + D = V

(3)

So, practically, we can solve for either U or D easily through substitution – I am

not going to bother with adding or subtracting equations. Let’s start by rewriting

one of our equations to solve for D, which we can then substitute into another

equation to solve for U.

U − D = V

U − V

= D

Now, by algebra we will simplify our second equation, removing the tricky frac-

tion.

U + D

= P

U = P (U + D)

U = P U + P D

Now substitute the known expression for D into the equation, then simplify.

U = P U + P (U − V

)

U = P U + P U − P V

U − 2P U = −P V

U(1 − 2P ) = −P V

U = −

P V

1 − 2P

We have a pesky negative sign that we can remove by negating the denominator,

then rewriting as a positive expression.

U =

P V

−(1 − 2P )

This yields our ﬁnal, simpliﬁed equation for U:

U =

P V

2P − 1

Okay, we have U entirely in terms of known variables. Good so far. With just

this, we can now solve for D by substituting our known equation for U into the

equation U − D = V

. Alternatively, we could solve by running through steps very

similar to those above, but beginning by rewriting for U instead of D. That seems

awfully redundant, however.

D = U − V

D =

P V

2P − 1

− V

D =

P V

2P − 1

−

(2P − 1)

2P − 1

D =

P V

− 2P V

+ V

2P − 1

D =

− P V

2P − 1

D =

(1 − P )

2P − 1

With U and D solved in terms of known variables, we can now solve for V

using the equation U + D = V

= U + D

P V

2P − 1

(1 − P )

2P − 1

P V

+ V

− P V

2P − 1

There we have it – V

solved in terms of known variables. As it turns out,

calculating the total vote total is the easiest of solving any of these unknowns.

There is one other thing that we must consider, and that is the fact that reddit’s

percent upvoted number is rounded – which means that we should develop a range,

not necessarily just a speciﬁc number. The simplest way would be to calculate

endpoints by evaluating these equations three times, once with P − 0.005, again

with P + 0.005, and ﬁnally with P . Alternatively, you could do this:

2P − 1

2(P − 0.005) − 1

2P − 1.01

We could rewrite all the equations like this, in order to set up intervals, but

rather than memorizing additional details, just subtract/add to P before plugging

a value into any equation. Additionally, do note that although you can get a middle

calculation by averaging the low/high calculations, this will not be the same as an

exact calculation done exactly at P . Realistically, it would make more sense to take

the average of the low and high calculations instead of trying for an exact estimate.

Anyhow, we have a suite of equations solved for known variables that we can

now use for future use. Here they are:

U =

P V

2P − 1

(4)

D =

(1 − P )

2P − 1

(5)

2P − 1

(6)

From these equations we could manipulate them into developing new equations

very easily. However, these three equations are fundamental and ﬂexible enough to

be left as they are.

Calculating a Margin-of-Error

The ﬁrst thing when it comes to calculating a margin is recognizing that the mean

of the high and low estimates are diﬀerent than the estimate made with exactly P ,

as opposed to P + 0.005 or P − 0.005. Let V

be the mean of the low and high

estimates. Here is the algebra behind proving that V

6= V

(1/2)(V

T L

+ V

T H

) = V

(1/2)(

2P − 0.99

2P − 1.01

) =

(1/2)(

(2P − 1.01)

(2P − 0.99)(2P − 1.01)

(2P − 0.99)

(2P − 1.01)(2P − 0.99)

) =

(1/2)(

(2P − 1.01) + V

(2P − 0.99)

(2P − 1.01)(2P − 0.99)

) =

2P V

− 1.01V

+ 2P V

− 0.99V

2(2P − 1.01)(2P − 0.99)

(2P − 1)

2(2P − 1.01)(2P − 0.99)

(2P − 1)

(2P − 1.01)(2P − 0.99)

Okay, here is where we do a little mathematical trickery to manipulate this

equation into one that clearly does not equal V

, which is what we are trying to

show. We do this by a little factoring and some interesting dividing.

(2P − 1)

− 4P + .9999

= V

(2P − 1)

− 4P + 1 − 1 + .9999

= V

Note that we added and subtracted 1 so that 4P

− 4P + 1 can be factored into

two terms.

(2P − 1)

(2P − 1)(2P − 1) − .0001

= V

(2P − 1) −

.0001

2P −1

2P − 1

We don’t need to be mathematically rigorous, but we should prove this nonequal-

ity beyond a doubt.

(2P − 1) −

.0001

2P −1

2P − 1

(2P − 1) −

.0001

2P − 1

6= 2P − 1

−

.0001

2P − 1

6= 0

Which is true, since −

.0001

2P −1

can never equal zero for any value of P . Therefore,

6= V

With that out of the way, let us calculate a margin of error, which will be

(1/2)(V

T H

− V

T L

). Let  represent the margin of error.

(1/2)(V

T H

− V

T L

) = 

(1/2)(

2P − 1.01

2P − 0.99

) =

2P V

− 0.99V

− 2P V

+ 1.01V

2(2P − 0.99)(2P − 1.01)

0.01V

(2P − 0.99)(2P − 1.01)

= 

We could modify the MOE equation a little bit, but it would not become sig-

niﬁcantly easier to work with. But this allows us to say the vote count interval is

± .

A Quick Example

– A certain post has 12,206 net votes and 96 percent of people upvoted. How many

votes were cast in total?

Let’s ﬁnd a lower bound and upper bound, then average those two numbers to

form an estimate. The lower bound will be given with P = 0.965, the upper bound

with P = 0.955. Using V

2P −1

, we receive (rounding to an integer) 13125 to

13413 total votes. Taking these two numbers and ﬁnding their midpoint, our best

estimate indicates about 13269 votes were cast.

Sigma

This experiment is a great example of real-life uses for systems of equations, which

are extremely powerful problem-solving tools. Although we are unable to ﬁnd exact

numbers for vote totals (in most cases), we can determine intervals and ultimately

achieve a reasonable approximation.

- AJO