Saturday, 13 September 2014

How unlikely was that!?

I was very tempted to call this post 'Spot the Maths/Stats Mistake #5', however there is a big difference between this post and the others I have made so far about the use, misuse and abuse of statistics.  The difference is that this time, the mistake was my own.

On Thursday, I was travelling back from a work event by train.  I didn't realise that I had a reserved seat, so just sat in a seat which was available.  I then discovered that I actually did have a reserved seat, and that I was in fact sat in it.  I thought to myself what an amazing coincidence this was - highly unlikely, I thought.

Cross Country Trains 'Voyager' Seating Plan
However, when I hear that something very unlikely has happened, I try to think about how unlikely it really was.  I genuinely believed that it was highly unlikely that this should happen.  There are 296 seats on this particular type of train, so I thought that the chance of this occurring was 1 in 296.  I was wrong.  Here is why.
  • About 70% of the seats were already occupied.
  • I didn't have a first class ticket, so 40 of the seats were unavailable to me.
  • I sat in the same carriage as a colleague, who had booked his ticket separately to me, but who did have a reserved seat.
So even with those first two observations, the 296 seats I had to choose from were reduced to around 77.  All of the reserved seats were in the front 2 carriages of the train, and since I deliberately sat in the same carriage as my colleague (although I couldn't sit next to him as the seat was occupied) this meant that I selected almost exclusively from the unoccupied reserved seats.  These constituted around half of the total number of unoccupied standard class seats, reducing my choice of seats further, to around 38.  I also know that I booked my tickets at around the same time as my colleague.  Assuming that seats are allocated approximately sequentially (e.g. filling up the train from the front) it was very likely that my reserved seat was in the same carriage as his.  Perhaps this means I had something resembling a 1 in 19 chance of sitting in my own seat.

This is a bit of a rough-and-ready approximation, based on some perhaps unreasonable assumptions and estimates, but I hope it illustrates the central point.  I thought that the probability of me sitting in the right seat was close to 1 in 300.  In fact, it was possibly less than 1 in 20.  Still unlikely, but we often tend to overstate just how unlikely things truly are.

Wednesday, 3 September 2014

Dear Mr Javid

A couple of weeks ago, I spoke to a Conservative Party member was canvassing in North Devon (a seat they seem highly likely to win from the Liberal Democrat incumbents at next year's UK General Election).  In talking to him, I made a critical error.  Namely, I gave him my email address.

Today, I received my first email from the Conservative Party.  It was trying to win my vote, I'm sure, but more pressingly they were trying to get me to donate money.  This is what it said:
"Whatever you did over the summer, I'm guessing you didn't go on a £21 billion spending spree. But, unbelievably, that's exactly what Labour did.
They made a host of promises - including more spending on benefits - which analysis shows would cost hardworking taxpayers £20.955 billion. In return, they only set out £105 million worth of spending cuts to pay for them.
£20.955 billion of spending minus £105 million of savings = £20.850 billion of unfunded spending commitments, which hardworking families would pay for with higher taxes and more debt.
We've got to stop Labour getting into power and wrecking our economy again. Donate £20 today and let's make sure Ed Miliband never gets into No. 10.
Labour STILL haven't learnt their lesson.
After taking Britain's economy to the brink and opposing every spending reduction we've made, Labour have spent the summer promising billions of pounds of inefficient and ineffective spending.
With just eight months to go until the General Election, it's clear that all Ed Miliband offers is more spending, higher taxes and more debt than our children could ever hope to repay.
We can't let him get his way.
Donate £20 today, and let's carry on working through the long-term economic plan that is building a stronger, healthier economy and securing a better future for Britain:
Donate today
Thanks, 

Sajid Javid MP 
PS Every pound you give will help to keep Ed Miliband out of No. 10 - so please donate today."
Naturally, I was only too happy to reply to Mr Javid, who I regard as probably one of the least-worst Conservative MPs:
"Dear Mr Javid,
Having considered your request for £20, I have been unable to set out any spending cuts to pay for such a donation.
£20 of spending minus £0 of savings = £20 of unfunded spending commitments.
In keeping with the spirit of your party's campaign, I therefore feel that the correct decision would be for me not to donate. 
Yours sincerely,
 Edward Russell"
 Perhaps I should also have added
"PS every pound I don't give will help to give smaller and less well-financed political parties a more level playing field on which to compete against the old behemoths of British politics".
Maybe next time.



Spot the Maths/Stats Mistake #4

I saw an article the other day from the Wall Street Journal's Japan Realtime blog.  I typically have nothing but praise for Wall Street Journal (WSJ) blogs, but this time was rather different.  WSJ pushed this on their social media platforms using the illustration below, and wrote about how they had established which countries had the cleanest hotel.


The obvious thing to note straight away is that this is not actually a comparison of countries, but a comparison of cities.  However, that's not my main issue with this analysis.

Although hotel rooms in Toyko, Moscow and Helsinki are more expensive than in Athens and Kiev (more on this later); it seems striking that in general the 'dirty' side of the list looks like a considerably more expensive set of cities than those on the 'clean' list.  This made me wonder whether there was anything more to this observation than just pure curiosity.

I began by looking into a bit more detail on how these 'cleanliness' numbers had been arrived at.  The obvious place to look was the source they kindly mentioned, so a quick bit of googling gave me this press release from hotel.info.  This opened up a whole can of worms.

The article begins by using UK cities, almost as a case study, to discuss the way that hotel cleanliness varies within a country.  We don't know whether this is because the UK is typical of the situation globally, however the press release does tell us that the UK is neither the cleanest (citing Switzerland and Austria as cleaner) nor the dirtiest (citing Denmark).

The comparison of the 10 largest (i.e. most populous) UK cities was presented in the following table:

The 5 best cities
PositionCityEvaluation of cleanliness
(Best score: 10.0)
1.Sheffield8.15
2.Liverpool8.12
3.Bristol8.10
4.Leeds8.10
5.Edinburgh7.97
The 5 worst cities
PositionCityEvaluation of cleanliness
(Best score: 10.0)
1.Birmingham7.42
2.London7.52
3.Leicester7.64
4.Manchester7.75
5.Glasgow7.90

So there we have one more city which could have featured as 'dirtier than London' on the list produced by the WSJ which, remember, claims to list "the dirtiest 10 cities".  In fact, Leicester could also feature on that list, absolving Brussels and Kiev of the dubious honour of appearing on it.

Of course, the WSJ don't want to do that, because they'd rather discuss capital city locations than they would Leicester, which most of their readers probably don't know how to pronounce (it's pronounced Lester, incidentally) let alone know anything about*.  That the list was only about capital cities would be a fair argument, were it not for the inclusion of Sao Paulo and Rio de Janeiro in their list.

The most obvious conclusion would be that the countries included by the WSJ (and compiled as part of the hotel.info press release) were chosen to make a point.  They could surely have found 10 cities dirtier than all of those on their list, but chose not to.  They also omitted the entire of North America (and indeed Africa and Oceania also).  One may well have wished to argue that cities in North America were simply neither the cleanest nor the dirtiest, but as we know from the exclusion of Birmingham, they could have made some North American cities feature in the list by simply excluding an arbitrary selection of others.

Having concluded that the article was meaningless due to the arbitrariness of the cities included, I also wanted to find out whether there was anything behind the observation that the cities on the 'dirty' side were often more expensive.

Finding out average hotel prices for each city was not as easy as I had thought, but I eventually came across this Bloomberg report, which did fulfil my needs, more or less.  I am convinced it must omit low-budget hotels, but I will forgive it on two counts:

  1. They actually give their methodology
  2. I suspect that low-budget hotels in Sao Paulo get far fewer reviews on hotel.info anyway, so the cleanliness ratings we are looking at may be more about the mid-to-high end hotels, at least in those poorer cities.
So, using their data for the prices (omitting Bern, whose average price was not available), I plotted the prices against cleanliness ratings to get the following:

It would be a little crude to draw a straight line through these data points given that I haven't made any argument involving linearity (and I don't intend to, for the purpose of this discussion does not require it), however I did calculate** the correlation between the two variables (price and cleanliness) and found, as I suspected, some negative correlation between the two.  For those who are interested, the correlation coefficient was -0.49.

What this means is that the more expensive the hotel room, the more likely its customers are to give its cleanliness a weak score.  I propose that this could be for either one of two reasons:
  1. The more expensive hotels are dirtier (which seems to be the WSJ's interpretation)
  2. People who stay in more expensive hotels expect a higher standard of cleanliness, and are more likely to give a poor rating when they find something wrong with the room than customers of cheaper hotels.
What I am arguing is that if two rooms were equally clean, but one was more expensive than the other, we may expect the cheaper of the two to have a favourable cleanliness rating by its customers.  This is something which the WSJ failed to consider.

--------


*This is less a criticism of WSJ readers, and more simply an acknowledgement of Leicester's low international profile.
**By which I mean, I made the statistical package calculate

Tuesday, 2 September 2014

Eats, Shoots & Leaves

Many will be familiar with the title of this post.  It is better known either as a part of a joke, or as part of the title to Lynne Truss' bestseller 'Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation' (a book which I am yet to read, and which readers of this blog may opine that I need to read with some urgency).

I saw something today which reminded me of this.  Namely, it was the title of a YouTube video:
"George Galloway on Losing It with Griff Rhys Jones"
 I wonder what you think this was about.  Was it

  1. George Galloway becoming very cross with Griff Rhys Jones, or
  2. George Galloway talking to Griff Rhys Jones on the subject of becoming very cross?