Videogames by numbers
Scoring a game is a strange and counter-intuitive thing to do. Yet we do it and you read them so often that we’ve all come to expect what these numbers mean, without really examining why they’re there and how we arrive at a certain score. In this article, I hope to explain why we score games in the way we do, shed some light on the process behind the numbers and untangle the web of conflicting variables that are taken into account when scoring a game. Maybe next time you read a review, you’ll look at the score in a different way.
Why we score – the need for numbers
Numbers are all around us. They describe the world we live in and give us a defined way of measuring things. Videogames, for instance.
When you go to a review on Thunderbolt, I’d bet that most of the time you’ll scroll down straight to the score, even if you were intending to read the entire article in the first place. When you’re considering whether or not to buy a game, you want to know how good it is and a score is a quick and easy indication of a quality.
Some websites approach this problem by placing the score on a second page where half of the review is, in an attempt to force you to read the article. The opposite solution is to satisfy this yearning for numbers and place the score at the top of the review, and the article – or explanation of it – underneath. We’ve chosen the accepted middle ground of a score at the bottom of the article, because after all, that’s what the number is; a conclusion. It sums up the author’s experience of the game in a neat and digestible package.
Do we need scores at all though? Getting rid of them would remove any discrepancy between article text and the score, but then you’d have to read through the entire review to get a feeling for a game. If this happened, the shortest reviews would end up being the most popular and then you’d have learnt little about the game at all.
But why all the fuss? Isn’t a number just a number? A score just a score?
Subjectivity and objectivity – the precarious nature of analysing a game
Every review is made up of observations that are both objective (unbiased and often factual) and subjective (personal opinion). Without either, we would be lost. An objective review wouldn’t be very entertaining at all:
Grand Turismo 6 has 100 cars, more than most racing games. Several game modes are available to the player; Race, Career and Multiplayer. When you win a race, money is added to your total, enabling you to buy faster cars.
Sure, all reviews need facts, but the sharp point of a review comes from the author’s opinion, however subjective it may be:
Gran Turismo 6’s visuals are astounding. Every detail on every car has been replicated to such a fine degree that you wonder whether the developers got any sleep over the last ten months. The graphics are perfectly complimented by a physics engine which feels just right.
It often turns out that these subjective – or improvable – sections of a review are the ones that people come to read. If you just wanted to know what was in the game, then you could just read a fact sheet or go to the game’s website. But what people truly value is an educated opinion and one that they can trust. You don’t want to buy and play every racing game to find out which one is the best; that’s what we as reviewers are here to do.
But how much objective and subjective matter do we include? How much do we assume that you already know about the game? If we were to write about World of Warcraft, do we need to explain massively multiplayer online games, or do we expect you to know the concepts already? Maintaining a balance between these two poles is what we as writers have to do to ensure that our reviews are suited to players of all experiences and calibres.
The non-linear nature of games yields different experiences
“Writing about music is like dancing about architecture.” – Elvis Costello
Most art critics have it easy. Music, film, photography and writing are all linear experiences. If I watch a film and then you watch it, we see and hear identical things, so only our opinion of the movie would differ. Videogames don’t subscribe to this model though, as you well know.
A lot of games are reasonably linear in the sense of there being several levels following on from each other in the same order, from beginning to end. Here only the way in which you complete each stage can differ. Yet some games are designed in a completely opposite way, with no discernible limits. Take The Sims for instance. You could take as many people as you like and no two players would ever do the same things or have the same experiences. As games become more and more complex, so the number of alternate and unique experiences increases.
This poses a problem for us reviewers. Should we play a game how we think most people would or do we stick to our own style? Do we try and test every conceivable event or just go through as we would normally? Hopefully you’re beginning to see our predicament and how the nature of videogames is often unfavourable when it comes to stamping a score on a game.
Previous experience matters – how history creates bias
Ideally, when I come to review a game, I would have played all other games. Of course, this would be very impractical and costly, but it would go some way to negate the bias that your gaming history imposes on you.
If I were to play every football game, then my opinion of FIFA 08 would probably be reasonably valid and accurate. However, if I had only played the Pro Evolution Soccer series, then my previous experience of football games would cause me to expect all other games to be like P.E.S. If I had never played a football game, then my opinion would be even more precarious.
This problem is not unique to videogames by any means and nor is the solution. We try to have games reviewed by writers who know the most about a certain genre, but there are always holes in a person’s knowledge that can affect how they perceive a game. Yet this expert knowledge unwittingly leads to another completely different problem; when and what should we compare?
Comparisons – how much should we compare games to others?
It must be a rollercoaster of emotions developing a sequel. People are always going to judge the game you make based on what went before it. Even with any game, its competition is the benchmark by which it is judged. Yet comparisons can be deceiving and different audiences deserve different opinions.
Let’s stick with FIFA. If you’ve played every other game in the series, then the new game might not be an enticing as it would be to someone who is looking to buy their first football game. A reader’s past experiences are just as varied as a reviewer’s. So the question is; do we rate a game based on its own merits or do we rate it based on the games that precede it?
This is another balance which we try to ensure is correct. While we might have one score, in the case of sequels and series, we usually try to mention how your past experiences might affect playing the game we’re writing about. Here’s an example, in our Football Manager 2007 review:
“If you’re not a football fan, don’t bother. If you are, you definitely should, but only once you know the investment required and the consequences of it.”
Expectations – the danger of knowledge and previous experience
The most significant threat to a reviewer’s psyche is the bias created by expectation. This comes in two forms; reading someone else’s review before forming your own opinion, and hype created by PR and past experiences.
In theory, when we begin to play a game for the first time, the score we will give the game is set at 5. Depending on our experiences, that number will go up or down until we’re satisfied that the score is representative of the game. However, we don’t always start at 5 because of expectations.
I don’t know about you, but when Halo 3 and Grand Theft Auto IV come out, I expect them to be good. You could predict with some certainty that each game will average review scores of over 85%. How do we know this? Because we assume that they will be as good, if not better, than their predecessors. This is something that as videogames journalist, we cannot shield ourselves from, but we must be aware of when we come to process our experiences and opinion. Our expectations must be mentally reset before we begin analysing a game, otherwise terrible inaccuracies could occur.
An expectation catastrophe – Driv3r
A near perfect example of when this can happen came along in the form of Driv3r. We expected it to be good, to build on the other games before it and take advantage of new technology. A lot of money was spent and the hype machine rolled on. It was going to get great scores, right?
The reality was that Atari released a substandard and untested game, one that sunk way below anyone’s expectations. But these expectations cased review scores to be wildly different to what they should have been.
Looking on GameRankings, the review aggregation website, you can see that the highest score for the PS2 version of Driv3r is 90%, but the lowest is 10%. This enormous range which covers almost the entire scale demonstrates how varied opinions were. My view is that the highest scores were based on raised expectations which were not counter-balanced, while the lowest scores were knee-jerk reactions to the huge disappointment.
When the PC version of Driv3r came out, little had changed in the game since its console outing, but expectations had. Instead of an average score of 59.2%, the PC version got just 40.7%. Although the range of scores was still slightly larger than normal, it only went from 64% to 18%. Reviewers had reset their expectations and didn’t expect it to be good at all, so its rating was less affected by the marketing hype that has preceded the initial release.
Different scales – out of 5, 10, 20, or 100? Maybe even A to E?
So if we’re going to score a game, accepting the risks and potential bias, how should we do it? Most publications go for either scores out of 5, 10 or 100. Some go for a scale up to 20, while others use a system we commonly see at school, of A to E. Which makes the most sense though and why does Thunderbolt use the 10 scale?
The answer to this depends on how accurate you estimate a reviewer can be, given the subjective nature of videogames and all the different experiences you can have. A scale of only 5 points is often too vague – there could be significant differences between two ‘4’ rated titles – while a 100 point scale is often too accurate to be used for this purpose.
At Thunderbolt, we feel that the 10 point scale is just right, allowing us to differentiate enough between games, but not so much that it would lose credibility. After all, what really is the tangible difference between a 6.4 game and a 6.5 game?
Analysing the scale – why 10 isn’t perfect and 0 isn’t not a game at all
Using numbers comes with its own risks though and not is quite as it seems. If you score a game 10 out of 10, it doesn’t mean that it’s perfect, just like an A+ on a school essay doesn’t imply that you couldn’t have done better. Likewise, a 0 out of 10 doesn’t mean that there is no game at all.
Some publications give more extreme scores than other, or at least a wider range of scores, but by assuming that no game can be perfect, some never use a 10. This narrowing of the scale isn ’t particularly healthy, so many writers subconsciously move the unattainable 0 and 10 scores to -1 and 11 respectively. A game can’t be perfect in the same way that you can’t give 11 out of 10.
The average game – why 7 is average, but 5 isn’t
When you look at all of the reviews from all publications and average them out, something is particularly striking; the average score isn’t 5. Looking on GameRankings, you’ll see that GameSpot, who have reviewed more games than anyone, average 67.7%. Even Edge Magazine, who are known to be harsher than most publications give an average score of 6.5 out of 10. Thunderbolt meanwhile averages 7.14 out of 10. Is there an inherent bias in reviewing? Take a look at this graph of GameSpot’s reviews, done by Metafuture:
You could answer this question by saying that most games are better than we expect them to be. Maybe we’ve subconsciously changed our expectations and 7 has become the average. After all, would you buy a 7-rated game and expect it to be several points better than most games? Maybe we want most games to be good and scores have crept up the scale as a result.
Another reason for this trend is that the worst games don’t get publicised and aren’t really worth reviewing. This is probably the reason why Thunderbolt’s average is higher than most. At least half of the games on here were purchased by us, so naturally we don’t want to buy bad games.
Maybe this isn’t just endemic to videogames reviewing. Do other art forms like music and film suffer the same fate? It would certainly be interesting to find out.
What reviewers should do – recognising the pitfalls
Part of our duty as reviewers is to be aware of the bias that can cloud our judgement and prevent it being passed on to our readers. By understanding the nature of reviewing itself, we can hopefully shield ourselves from the dangers that analysing videogames can present.
What readers should do – making sense of the numbers
As readers, you should be conscious of the fact that rating videogames is not necessarily like rating films or music. Games present new challenges to reviewers and these considerations should be taken into account.
So, what does it all mean?
Maybe the interactivity that videogames offer make them intrinsically better than older art forms. After all, would you rather watch someone fighting zombies or be part of the action and determine the outcome yourself? It’s possible that we just haven’t adapted our rating scale to take this into account. Either way, knowledge of the reviewing process and the dangers it can present are integral part of understanding how and why we give the scores we do.