Visitors Now:
Total Visits:
Total Stories:
Profile image
By ScienceBlogs (Reporter)
Contributor profile | More stories
Story Views

Now:
Last Hour:
Last 24 Hours:
Total:

The Mathematics of Reddit Rankings, or, How Upvotes Are Time Travel [Built on Facts]

Wednesday, January 16, 2013 8:20
% of readers think this story is Fact. Add your two cents.

(Before It's News)

Ok, so this isn’t really physics as such, but it’s pretty fascinating. There’s a very large online community called Reddit in which users submit links which interest them. These links come with two little arrows beside them, and the users can vote the link up or down. Here’s a screenshot of how the website looks to me at the time of this writing:

As I visit on different days or on different times on the same day, the links and their order changes. This keeps the site fresh and news-y, at least if you like your news full of cat memes. It’s pretty clear that the ordering of these links is both a function of when they were submitted by the users and of the votes they receive, but how exactly does this work?

The algorithm itself is explained in this very informative post by Amir Salihefendic. In short, every post is assigned a number given by the function:

\displaystyle f(n, t) = \log_{10}(n) + \frac{t}{45000}

Here n is the net number of upvotes. For 10 votes up and 0 votes down, n = 10. For 50 votes up and 40 votes down, n also equals 10. Next, t is the time in seconds after an arbitrary moment that happens to be in 2005. The choice of that arbitrary moment doesn’t matter – what matters are the differences in scores. This function f(n, t) is calculated for each link, and they are sorted in order from the greatest to the least value of f. I’ve slightly simplified the equation by dropping a coefficient that makes no difference for positive n.

Ok, great. Now what does this all mean? Amir’s post gives some examples, but I want to dig a little bit into the interpretation of this equation. In physics it’s very often the case that an equation isn’t just some abstract mathematical machine, but rather it’s a natural statement which has an intuitive interpretation we can understand. For instance, \nabla \cdot \mathbf{E} = \rho / \epsilon_0 is an abstract vector calculus statement, but physicists see that equation and understand it intuitively as the idea that electric field lines diverge outward from sources of electric charge. That’s a more useful way of thinking of it than “Ok, now we have to solve some horrible partial differential equation before we can know anything at all.” Intuition gives us a qualitative picture, and from there we can do the hard work to get a numerical answer when required.

Since Reddit’s equation is just used to generate an ordering, an overall multiplicative factor doesn’t matter. If a score of 20 is ranked ahead of 15, then 200 will be ranked ahead of 150. So let’s multiply Reddit’s equation by 45000 seconds.

\displaystyle f(n, t) = 45000 \log_{10}(n) + t

Effectively this just means the posts are sorted in order by t, the time they were posted. Newer posts are higher. But there’s that log(n) term – it moves the posts forward in time. Newer posts are listed first, and a post becomes even newer by getting votes. If n = 10, then log(10) = 1 and the post is moved forward 45000 seconds, or 12.5 hours. If n = 100, then log(100) = 2 and the post is moved forward 90000 seconds, or 25 hours. We can plot this for more and more net upvotes:

Hours added as a function of net upvotes received

Hours added as a function of net upvotes received

The returns are diminishing. Logarithms are slowly increasing functions, so each additional upvote moves the post forward in time by a smaller and smaller amount. Even with thousands of votes, a post has only moved about two days into the future, which is why posts never last more than a day or so on the front page. After that it gets overtaken by any new posts, even ones with few upvotes.

In politics we often hear that every vote counts. In Reddit, we can actually figure out how much each vote counts. If I upvote or downvote a post, how far does my individual vote move that post in time? For large n, it’s a very accurate to approximate the change in log(n) (for each additional vote) by its derivative:

\displaystyle \log_{10}(n+1) - \log_{10}(n) \approx \frac{1}{0.434n}

Well that 0.434 is a little annoying but hey, I didn’t chose to use base 10 logarithms. (Had they used base e = 2.718… then it would just be 1/n.) What this means is that if a post has 10 votes, your upvote will add about 45000*0.434/10 = 1954.3 seconds, or about 33 minutes. A downvote would move it backwards by that same amount. If a post has 50 votes, your upvote (or downvote) will move it forward or backward by about 6.5 minutes. For a 4700 vote post like one of the ones in the screenshot above, each vote makes a mere 4 seconds difference.

Seconds added (or subtracted) for each additional upvote (or downvote), as a function of net upvotes received so far

This might suggest an improvement on the “subscribe” and “unsuscribe” system – if there’s a subreddit you’re interested in but not that interested in (/r/aww maybe?), you could give it a handicap by having Reddit subtract (say) a 6 hour penalty on every post from that subreddit. This would require a /r/aww post to get about 3 times as many votes to overtake an unpenalized post which was originally made at the same time. (Homework: given a h hour penalty, how many times more votes does the penalized post required to overtake a simultaneously-posted unpenalized post?) Correspondingly, you could give a bonus for subeddits you want to see more of. Unfortunately this is probably not a feasible suggestion. Separately sorting huge lists for millions of users would probably melt the servers. But it would be a nice feature.

All right, better wrap this one up. As far as user-vote-based ranking goes, Reddit’s is unusually interesting from a mathematical standpoint. For what it’s worth, I give it my upvote.



Source:

Report abuse

Comments

Your Comments
Question   Razz  Sad   Evil  Exclaim  Smile  Redface  Biggrin  Surprised  Eek   Confused   Cool  LOL   Mad   Twisted  Rolleyes   Wink  Idea  Arrow  Neutral  Cry   Mr. Green

Top Stories
Recent Stories

Register

Newsletter

Email this story
Email this story

If you really want to ban this commenter, please write down the reason:

If you really want to disable all recommended stories, click on OK button. After that, you will be redirect to your options page.