Information is power! So it is, but only if you know how to use it. In modern business practice we often have to work in teams. If we are to make use of information then we have to be able to communicate it to our fellow team members. It's no use being an extremely clever mathematician. capable of working out all sorts of wonderful parameters to describe populations, if your team members are not able to understand what you are saying.
There is no virtue in expressing your findings in complicated jargon that needs a PhD in Mathematics to understand. That sort of knowledge is not power. It does not get used and the views of the person putting it forward can end up being ignored simply because the rest of the team did not understand what was said.
Example
Taken from the introduction to a report supporting a Market Segmentation Strategy for Rail Tickets
Does this information inspire you to believe the writer?
In sharing this thought about the dangers of choosing an incorrect way of deciding how to group the customers has the statement above convinced you that the writer's business sense can be trusted?
Does the heavy use of jargon convince you that the writer must know what they are talking about because it is hard to understand?
The piece requires reading skills of graduate level. Even then it uses jargon which is only likely to be understood by a specialist statistician. Do you understand it?
What is actually says is:
There is no virtue in jargon for its own sake. Jargon develops between specialists as a type of shorthand to express complicated ideas in a very much shortened form. Every group of specialists develop their own verbal shorthand often without realising that they are excluding other listeners from their reasoning. You will always meet jargon within business areas. If you are going to survive and prosper you must learn enough jargon to get by and become confident enough to ask when you do not understand what has been said.
If experts cannot explain the basis of their reasoning to an intelligent school child then the problem lies in the expert not the child.
Likewise the reasoning behind the statistical techniques which are used throughout business are not very complicated. The jargon tends to obscure the meaning but most of the statistical ideas used in business are very simple. The most powerful techniques simple involve sorting into piles and counting the number of items in each pile. We may decide to use a computer to do the sorting but that only speeds up the leg work, it doesn't make the reasoning any more complicated.
You will, however, need to learn some basic statistical jargon. When you venture among statisticians and mathematicians they speak a different language. I've often heard the complaint about my native land that when you go into a shop or pub all the people suddenly start speaking Welsh. It's not true, they were speaking Welsh before you came in and will speak it again after you have gone. Often though the Welsh will switch to the English in a good-mannered effort to accommodate the ignorance of foreigners. Most mathematicians and statisticians are not so innately polite. They are often ill-mannered pedants who will laugh at your stumbling attempts to speak their strange tongue. Even worse you will find that they will patronise you for your inability to speak Math with a polished accent. Don't be put off by this. Ask the basic questions and learn enough of the language to spot when you are being bull-sh...d.
Over the next few sessions we are going to study basic techniques and learn a few rudimentary words of the language of the Statisticians. At most this will involve some simple arithmetic which can easily be done on a calculator and perhaps looking up the odd number in a table.
So far we have been looking at ways of drawing pictures of the facts we have been finding out from questionnaires. We then started to think about simple ways of describing these facts in single numbers. We have looked at ways of finding a representative member of the group which is about average. We used three different ways to work this out.
1. The Mode.
2. The Median.
3. The Mean.
Sometimes these measures give different results but as long as we know which one we are using we can understand what it tells us about our group. But groups of results can have similar averages but be made up of very different numbers. Sometimes this can matter.
Lets take a financial example.
I've been offered a choice between investing in either of two different bonds. To help me decide my stockbroker has supplied the maturity yield for the last six years for each choice.
6.0% 5.7% 5.6% 5.9% 6.1% 5.5%
Nick Leeson's Get Rich Quick Bond
7.2% 7.7% 4.9% 3.1% 3.4% 8.5%
My first thought is to work out the mean value of the yield to see which one is the best.
Nick Leeson's Get Rich Quick Bond Mean Yield 5.8%
So the means are the same but does this mean both bonds are equally risky? Looking at the results Nick Leeson seems far more risky than Les Chadwick. In some years Nick Leeson's bond has been as low as 3.1% and as high 7.7%. That's up to 2.7% higher than the mean and also 2.7% lower than the mean value.
Les Chadwick seems much more steady in his results. He's been within 0.3% lower than average and 0.3% higher than average over the six years. He seems to be more consistent or perhaps less variable than Nick Leeson, even though their averages worked out the same.
What does this Mean?
Over the longer term it might not be a problem. If I'd decided to invest £100 in each bond six years ago then my return for Chadwick would be £140.25 and from Leeson £140.09 so perhaps that means it was OK. It doesn't really matter that one is more variable than the other.
But wait, what if I'd put my £100 in four years ago then my return from Chadwick would be £125.18 while from Leeson it would be only £118.84. So perhaps it does matter after all.
But how can I explain simply how much spread there is in a set of results? I suppose the range might be helpful.
Range for Leeson 8.5% to 3.1%
But this is not really using all the information that I have available. My broker very kindly supplied six years figures for me to use.
What about putting more steps into the range
If I sort the figures into order and then work out not just the extreme points (the range) and the middle point (the Median) but also the points that are one quarter and three quarters of the way up the sorted list I would have another set of measures called the Quartiles. Each quartile marks a quarter of the data points. So we have the First Quartile which is one quarter of the way up the list. The Second Quartile which is halfway up the list and is also called the Median. The Third Quartile which is three quarters of the way up the list. And finally we have the Max. and Min values which are at the top and the bottom of the list.
(All these functions can be found in the Excel Spread Sheet under the Insert Function Command.)
Lets see what results this gives me
First Quartile 5.625
Median 5.8
Third Quartile 5.975
First Quartile 3.775
Median 6.05
Third Quartile 7.575
I can use this to help me understand the differences between the two bonds because the first and third quartile cut off the extreme ends of the data. Half of the results fall between those two points. This measure is called the Interquartile range and in this case means that for half the time the yield has fallen between these two points.
Nick Leeson's Get Rich Quick Bond Interquartile Range 3.775 - 7.575
Perhaps if I work out the difference between the actual yield and the mean for each year I could find the average difference.
0.2% -0.1% -0.2% 0.1% 0.3% -0.3%
Nick Leeson's Get Rich Quick Bond
1.4% 1.9% -0.9% -2.7% -2.4% 2.7%
Now if I take the average of these it will give me a rough idea of the spread of results above and below the mean, won't it
Sorry there's a bit of problem here. Both averages work out as zero, because both bonds seem to have as many good results are bad and they end up cancelling each other out. Try the sums yourself just to make sure I'm not lying to you.
There were as many positive moves as there were negative ones and so when I added then all up they cancelled out and told me nothing new. I' m going to have to resort to a Mathematician's Trick to get round this problem.
Statistically he's right!
I'm sure you remember the chunk of heavy jargon this session started off with. If you read it just before going to bed you could wake up screaming about it. But it kept mentioning squares. The word Squared occurred no less than five times in 119 words. That's 4.2% of the words in that statement concerned squares. Why should squares be so important to statisticians that they can't seem to string more than a few words together without resorting to some variation on the word square?
The reason is that the mathematical process of squaring gets round the problem I had when I added together the distances from the mean and found that the numbers cancelled out. Squaring to Mathematicians, Statisticians and other Assorted Anoraks means to multiply a number by itself. It's often written with a little number 2 placed just behind and above the number that is being squared (see there's that word again).
The magic of squaring is that what ever number you square the answer is always a positive number. Try it now with a calculator.
Multiply +1 times +1 and the answer is +1
Multiply -1 times -1 and the answer is still +1
Try any number you like -12 times -12 is +144
It always works out that any number multiplied by itself (squared) gives an answer that is a positive number. Positive numbers are very well behaved and don't cancel each other out without warning. If I take each of the differences from the mean, square them and then add them together and get a number called the sum of the squares.
Differences......Squared Difference
0.2 .....0.04
-0.1 .....0.01
-0.2 .....0.04
0.1 .....0.01
0.3 .....0.09
-0.3 .....0.09
Total 0.28
That's much better now I can add up the numbers without the problem of them cancelling out. But next I need to take an average of the squared difference from the mean. This is where the Statisticians will try and catch you out. The mean is easy, divide by the number of data observations.
WRONG!! You divide by one less than the number of observations.
This is not just perversity to catch out the unwary and trip up those who don't speak fluent Math. It's actually done to avoid overusing the amount of information that we have. To work out the mean I used all the data set. This used up something that is called a degree of freedom. Because I have worked out the mean I have to correct for the fact I've used the data already and can not count it twice if I want an accurate answer. So I must calculate the mean of the differences from the mean by dividing by one less than the number of points.
The corrected mean of the differences from the mean works out to be 0.056. This is called the variance of the data set. Variance is a very important idea in statistics because it tells us in one number just how a data set spreads itself about. Once we know the mean and variance of a data set we can do all sorts of slick calculations and forecasts about our population.
There is just one more thing we need to learn before we leave spread behind us. This squared variance is a very difficult unit to visualise just because it is a square unit. I personally found the idea of 0.056 square per cents quite a difficult unit to think in. It would be even worse if I'd calculated the variance of your ages because then I'd have to think in square years. To use such units in anger requires the sense of humour of Michael Bentine who really understood square worlds. For the rest of us simple mortals there is a way forward though. Take the square root of the variance and you have something called the standard deviation. This is in the same units as your data.
Let's just see what I now know about my two bonds.
Mean Yield 5.8% Standard Deviation 0.236643%
Nick Leeson's Get Rich Quick Bond
Mean Yield 5.8% Standard Deviation 2.311709%
The difference is now much easier to spot and I can invest with confidence.
Return to Main Index