The Goodreads Choice Awards are in their first round of voting, which prompted some discussion on Twitter concerning the fact that their shortlist for Best Science Fiction has only one woman on it (out of 15 books).
Goodreads states that they “analyze statistics from the millions of books added, rated, and reviewed on Goodreads” to choose the shortlists.
Well, this seems pretty straightforward, right? Their awards are about what books their users think are best. So they pick the top 15 books according to their user database, and bang! Shortlist! If there aren’t that many women on it, it just means more people have to read and give high ratings to books written by women. There’s nothing Goodreads can do without sullying the math of their awards process — right?
Except . . .
What does “best” mean? When they’re doing their statistics, how are they choosing what are the top-rated books?
Believe it or not, this is a nontrivial mathematical problem.
For a good series of articles on the mathematics of ranking algorithms, try these three posts. But here’s a simple example to show why it’s not always so simple. Say we were trying to compare two books in a ranking system, and one had 500 reviews with a 4.6 rating, and the other had 1000 reviews with a 4.4 rating.
Which book is “better?”
More importantly — which book do we want to be better? Because really, we could make a case for ranking either one higher. What ranking math like in the links above does is find some reasonably consistent way of making the rankings come out in the way we think they should.
But people are going to have different opinions on which bits of data they think should be privileged. They’re going to have different opinions on what “how it should be” looks like.
I want to make clear: privileging one stripe of data or another is not necessarily wrong math. But we can judge it against systems that privilege the data in other ways, and argue that one or the other system is giving us the most intuitive, useful, or reflective ranking. It’s not a matter of math anymore at that point; it’s a matter of what we want the ranking system to do.
For instance, let’s say (this is made up; I have no insider knowledge) that GR is looking at star rating and number of reviews as its two most important criteria. If it has to privilege one of those, which should it be? Number of reviews would probably favor more established authors; star rating would probably favor less established authors. What if GR wants to include other criteria, or at least needs some tiebreaking data? Should they privilege the author with a higher overall author ranking or engagement? That might make some sense intuitively, but it will again favor more established pros. And if we’re looking for the best books, there’s a good argument that given two books with roughly equivalent buzz, the less established author’s book should be privileged, as it had a larger hump to get over to get the buzz it did. So should GR privilege the book with the lower rated author overall?
Here’s another way Goodreads might hypothetically analyze their data (and again, all of this is MADE UP; I have no idea how Goodreads chooses their nominees or whether anything like the ranking trends I’m about to pull out of the air actually exist). Goodreads examines books released in the past year from November to November. But it’s logical to think that maybe the rankings and reviews curve will look different over time. For instance, hypothetically, maybe books tend to drop half a star rating from the time they come out (when diehard fans read them) to the time they level out a few months later. During the same time, the number of reviews will increase as the time the book’s been out increases — a book that’s just come out will, all else being equal, have fewer reviews than one that’s been out for a few months.
So most people would say it would make some sense for GR to adjust their best-of ranking algorithms to take this into account. Adjust the number of reviews up for books that have been out for less time, adjust it down for books that have been out for more time, adjust the star rating down for books that have just come out and up for books that have been out for a while, and then get their top picks off that data. That seems fair, right?
But you could make the exact same argument for, say, books by men versus books that aren’t by men. Now, I’m not saying I advocate GR doing this, and I don’t think they even collect author gender data — and there are good reasons we wouldn’t necessarily want them to start. But there’s an argument to be made that books by, say, women — as a whole — at least get less exposure, so their number of reviews should maybe be adjusted up, curving the data until the average female author is getting the same number of reviews as the average male author.
In my opinion, adjusting according to gender is logically equivalent to adjusting according to date. Just like we could say a man might deserve the buzz he got above a woman, in our example we could likewise make the argument that a book that just came out might be awesome enough to maintain its star rating even if other books tend to drop. But you’d find a lot more people who’d be okay with adjusting by date than people who’d be okay with adjusting by gender.
Incidentally, I’d be one of them! (If for no other reason than the data-gathering issues I mentioned.) But I think we have to be very careful when we use numbers and statistics to justify saying something is “best” according to an audience. Because the math really does have different legitimate sides to it, and two people working from the exact same numerical data set could reasonably come up with two different best-of lists.
I’m still suffering comment-response-guilt over my last two “math in genre” posts, so I’m closing comments for my own sanity. Please feel free to link this elsewhere and to approach me on Twitter (@sl_huang) to discuss it.