An Explanation of Estimating Vote Counts on
reddit
Adam J. O’Neil
July 26, 2018
The first rule of reddit: never acknowledge that you are a redditor to anyone in
real life. Ever.
Here’s a fun mathematical experiment for the day: on the social network reddit,
users can decide which stories gain traction by upvoting or downvoting posts. An
upvote counts as +1 vote, a downvote -1 vote, and no action at all indicates a net
change of 0. As recently as a few years ago, reddit would show the total upvotes
and downvotes, as well as a percentage indicating the ratio of these numbers to
the total of votes. They removed that a while ago, but they kept two important
numbers that we could use to calculate a close estimate of the upvote/downvote
totals, as well as the total number of votes overall.
These two numbers are the net votes and percentage of users that voted upvote,
which are displayed for every post. The net votes here would be the upvotes +
downvotes. The percentage of upvotes (percent liked) would therefore be the ratio
of upvotes divided by (upvotes + downvotes). We can use these two equations to set
up a system of equations to solve for each individual variable. In addition, I will
add a third equation that ties these two variables to the total vote count, which is
upvotes + downvotes = total votes.
Equations
First, let V
T
= total votes, V
N
= net votes, P = percent liked (as a probability, not
percent), U = number of users that upvoted, D = number of users that downvoted.
V
N
will be an integer, P will be a probability greater than 0.5 and less than or equal
to 1. We only know P and V
N
at the moment. Here is the foundation:
U D = V
N
(1)
U
U + D
= P (2)
U + D = V
T
(3)
So, practically, we can solve for either U or D easily through substitution – I am
not going to bother with adding or subtracting equations. Let’s start by rewriting
1
one of our equations to solve for D, which we can then substitute into another
equation to solve for U.
U D = V
N
U V
N
= D
Now, by algebra we will simplify our second equation, removing the tricky frac-
tion.
U
U + D
= P
U = P (U + D)
U = P U + P D
Now substitute the known expression for D into the equation, then simplify.
U = P U + P (U V
N
)
U = P U + P U P V
N
U 2P U = P V
N
U(1 2P ) = P V
N
U =
P V
N
1 2P
We have a pesky negative sign that we can remove by negating the denominator,
then rewriting as a positive expression.
U =
P V
N
(1 2P )
This yields our final, simplified equation for U:
U =
P V
N
2P 1
Okay, we have U entirely in terms of known variables. Good so far. With just
this, we can now solve for D by substituting our known equation for U into the
equation U D = V
N
. Alternatively, we could solve by running through steps very
similar to those above, but beginning by rewriting for U instead of D. That seems
awfully redundant, however.
2
D = U V
N
D =
P V
N
2P 1
V
N
D =
P V
N
2P 1
V
N
(2P 1)
2P 1
D =
P V
N
2P V
N
+ V
N
2P 1
D =
V
N
P V
N
2P 1
D =
V
N
(1 P )
2P 1
With U and D solved in terms of known variables, we can now solve for V
T
using the equation U + D = V
T
.
V
T
= U + D
V
T
=
P V
N
2P 1
+
V
N
(1 P )
2P 1
V
T
=
P V
N
+ V
N
P V
N
2P 1
V
T
=
V
N
2P 1
There we have it V
T
solved in terms of known variables. As it turns out,
calculating the total vote total is the easiest of solving any of these unknowns.
There is one other thing that we must consider, and that is the fact that reddit’s
percent upvoted number is rounded – which means that we should develop a range,
not necessarily just a specific number. The simplest way would be to calculate
endpoints by evaluating these equations three times, once with P 0.005, again
with P + 0.005, and finally with P . Alternatively, you could do this:
V
T
=
V
N
2P 1
V
T
=
V
N
2(P 0.005) 1
V
T
=
V
N
2P 1.01
We could rewrite all the equations like this, in order to set up intervals, but
rather than memorizing additional details, just subtract/add to P before plugging
3
a value into any equation. Additionally, do note that although you can get a middle
calculation by averaging the low/high calculations, this will not be the same as an
exact calculation done exactly at P . Realistically, it would make more sense to take
the average of the low and high calculations instead of trying for an exact estimate.
Anyhow, we have a suite of equations solved for known variables that we can
now use for future use. Here they are:
U =
P V
N
2P 1
(4)
D =
V
N
(1 P )
2P 1
(5)
V
T
=
V
N
2P 1
(6)
From these equations we could manipulate them into developing new equations
very easily. However, these three equations are fundamental and flexible enough to
be left as they are.
Calculating a Margin-of-Error
The first thing when it comes to calculating a margin is recognizing that the mean
of the high and low estimates are different than the estimate made with exactly P ,
as opposed to P + 0.005 or P 0.005. Let V
µ
be the mean of the low and high
estimates. Here is the algebra behind proving that V
µ
6= V
T
:
(1/2)(V
T L
+ V
T H
) = V
µ
(1/2)(
V
N
2P 0.99
+
V
N
2P 1.01
) =
(1/2)(
V
N
(2P 1.01)
(2P 0.99)(2P 1.01)
+
V
N
(2P 0.99)
(2P 1.01)(2P 0.99)
) =
(1/2)(
V
N
(2P 1.01) + V
N
(2P 0.99)
(2P 1.01)(2P 0.99)
) =
2P V
N
1.01V
N
+ 2P V
N
0.99V
N
2(2P 1.01)(2P 0.99)
=
2V
N
(2P 1)
2(2P 1.01)(2P 0.99)
=
V
N
(2P 1)
(2P 1.01)(2P 0.99)
=
Okay, here is where we do a little mathematical trickery to manipulate this
equation into one that clearly does not equal V
T
, which is what we are trying to
4
show. We do this by a little factoring and some interesting dividing.
V
N
(2P 1)
4P
2
4P + .9999
= V
µ
V
N
(2P 1)
4P
2
4P + 1 1 + .9999
= V
µ
Note that we added and subtracted 1 so that 4P
2
4P + 1 can be factored into
two terms.
V
N
(2P 1)
(2P 1)(2P 1) .0001
= V
µ
V
N
(2P 1)
.0001
2P 1
6=
V
N
2P 1
We don’t need to be mathematically rigorous, but we should prove this nonequal-
ity beyond a doubt.
1
(2P 1)
.0001
2P 1
6=
1
2P 1
(2P 1)
.0001
2P 1
6= 2P 1
.0001
2P 1
6= 0
Which is true, since
.0001
2P 1
can never equal zero for any value of P . Therefore,
V
µ
6= V
T
.
With that out of the way, let us calculate a margin of error, which will be
(1/2)(V
T H
V
T L
). Let represent the margin of error.
(1/2)(V
T H
V
T L
) =
(1/2)(
V
N
2P 1.01
+
V
N
2P 0.99
) =
2P V
N
0.99V
N
2P V
N
+ 1.01V
N
2(2P 0.99)(2P 1.01)
=
0.01V
N
(2P 0.99)(2P 1.01)
=
We could modify the MOE equation a little bit, but it would not become sig-
nificantly easier to work with. But this allows us to say the vote count interval is
V
µ
± .
5
A Quick Example
– A certain post has 12,206 net votes and 96 percent of people upvoted. How many
votes were cast in total?
Let’s find a lower bound and upper bound, then average those two numbers to
form an estimate. The lower bound will be given with P = 0.965, the upper bound
with P = 0.955. Using V
T
=
V
N
2P 1
, we receive (rounding to an integer) 13125 to
13413 total votes. Taking these two numbers and finding their midpoint, our best
estimate indicates about 13269 votes were cast.
Sigma
This experiment is a great example of real-life uses for systems of equations, which
are extremely powerful problem-solving tools. Although we are unable to find exact
numbers for vote totals (in most cases), we can determine intervals and ultimately
achieve a reasonable approximation.
- AJO
6