Wednesday 3 September 2014

Spot the Maths/Stats Mistake #4

I saw an article the other day from the Wall Street Journal's Japan Realtime blog.  I typically have nothing but praise for Wall Street Journal (WSJ) blogs, but this time was rather different.  WSJ pushed this on their social media platforms using the illustration below, and wrote about how they had established which countries had the cleanest hotel.


The obvious thing to note straight away is that this is not actually a comparison of countries, but a comparison of cities.  However, that's not my main issue with this analysis.

Although hotel rooms in Toyko, Moscow and Helsinki are more expensive than in Athens and Kiev (more on this later); it seems striking that in general the 'dirty' side of the list looks like a considerably more expensive set of cities than those on the 'clean' list.  This made me wonder whether there was anything more to this observation than just pure curiosity.

I began by looking into a bit more detail on how these 'cleanliness' numbers had been arrived at.  The obvious place to look was the source they kindly mentioned, so a quick bit of googling gave me this press release from hotel.info.  This opened up a whole can of worms.

The article begins by using UK cities, almost as a case study, to discuss the way that hotel cleanliness varies within a country.  We don't know whether this is because the UK is typical of the situation globally, however the press release does tell us that the UK is neither the cleanest (citing Switzerland and Austria as cleaner) nor the dirtiest (citing Denmark).

The comparison of the 10 largest (i.e. most populous) UK cities was presented in the following table:

The 5 best cities
PositionCityEvaluation of cleanliness
(Best score: 10.0)
1.Sheffield8.15
2.Liverpool8.12
3.Bristol8.10
4.Leeds8.10
5.Edinburgh7.97
The 5 worst cities
PositionCityEvaluation of cleanliness
(Best score: 10.0)
1.Birmingham7.42
2.London7.52
3.Leicester7.64
4.Manchester7.75
5.Glasgow7.90

So there we have one more city which could have featured as 'dirtier than London' on the list produced by the WSJ which, remember, claims to list "the dirtiest 10 cities".  In fact, Leicester could also feature on that list, absolving Brussels and Kiev of the dubious honour of appearing on it.

Of course, the WSJ don't want to do that, because they'd rather discuss capital city locations than they would Leicester, which most of their readers probably don't know how to pronounce (it's pronounced Lester, incidentally) let alone know anything about*.  That the list was only about capital cities would be a fair argument, were it not for the inclusion of Sao Paulo and Rio de Janeiro in their list.

The most obvious conclusion would be that the countries included by the WSJ (and compiled as part of the hotel.info press release) were chosen to make a point.  They could surely have found 10 cities dirtier than all of those on their list, but chose not to.  They also omitted the entire of North America (and indeed Africa and Oceania also).  One may well have wished to argue that cities in North America were simply neither the cleanest nor the dirtiest, but as we know from the exclusion of Birmingham, they could have made some North American cities feature in the list by simply excluding an arbitrary selection of others.

Having concluded that the article was meaningless due to the arbitrariness of the cities included, I also wanted to find out whether there was anything behind the observation that the cities on the 'dirty' side were often more expensive.

Finding out average hotel prices for each city was not as easy as I had thought, but I eventually came across this Bloomberg report, which did fulfil my needs, more or less.  I am convinced it must omit low-budget hotels, but I will forgive it on two counts:

  1. They actually give their methodology
  2. I suspect that low-budget hotels in Sao Paulo get far fewer reviews on hotel.info anyway, so the cleanliness ratings we are looking at may be more about the mid-to-high end hotels, at least in those poorer cities.
So, using their data for the prices (omitting Bern, whose average price was not available), I plotted the prices against cleanliness ratings to get the following:

It would be a little crude to draw a straight line through these data points given that I haven't made any argument involving linearity (and I don't intend to, for the purpose of this discussion does not require it), however I did calculate** the correlation between the two variables (price and cleanliness) and found, as I suspected, some negative correlation between the two.  For those who are interested, the correlation coefficient was -0.49.

What this means is that the more expensive the hotel room, the more likely its customers are to give its cleanliness a weak score.  I propose that this could be for either one of two reasons:
  1. The more expensive hotels are dirtier (which seems to be the WSJ's interpretation)
  2. People who stay in more expensive hotels expect a higher standard of cleanliness, and are more likely to give a poor rating when they find something wrong with the room than customers of cheaper hotels.
What I am arguing is that if two rooms were equally clean, but one was more expensive than the other, we may expect the cheaper of the two to have a favourable cleanliness rating by its customers.  This is something which the WSJ failed to consider.

--------


*This is less a criticism of WSJ readers, and more simply an acknowledgement of Leicester's low international profile.
**By which I mean, I made the statistical package calculate

1 comment:

  1. People with a Statistics background may wish to use ANOVA to delve a little further into this.

    ReplyDelete