Triple crown stats not the best way to grade a player

By Roger Weber

Sabermetrics has been improving the way fans enjoy baseball for 30 years. Bill James has been publishing Baseball Abstracts since 1977 and now books are being released with findings based on watching and studying every baseball game since 2003. So I don't think I have to invent new statistics. Even if I try, my fairly amateur approach would pale compared to the work of some of the great "sabermetricians".

But I can use some existing statistics to make some findings about other statistics. Introducing correlation tables - tables of correlation values between certain stats. Correlation can be shown easily through statistics. If you have, say, batting averages and hit totals for a set of players, you can graph those two sets of data against each other. Most likely using those two stats, you will get a set of dots that pretty closely connect to form a straight line. A graph like that shows a high correlation between those variables being measured.

Any advanced computer graphing system or calculator can tell correlation between two sets of data using a figure called a correlation coefficient. This is a value between -1 and 1 that tells how closely the sets of data correlate. For this project it makes most sense to measure a linear correlation because we want to see if the points form a straight line, if when one variable increases, the other also increases or decreases. If the correlation coefficient is positive, it means that as one variable increases, so does the other. If it is negative, it means that as one variable increases, the other decreases. A coefficient farther away from zero means the data correlate more closely to a straight line. A correlation coefficient, often called an "r^2", close to zero indicates that there is no discernible linear relationship between these two sets of data. It doesn't mean there isn't one between two variables. It just means that it doesn't show up in these sets of data.

Correlation doesn't simply imply a cause and effect relationship. It just measures the numerical increases and decreases of variables against each other. The salary of schoolteachers in North Dakota from 1990-2000 may closely positively correlate with the number of residents of Fresno, California each year during the same period, but the two probably aren't causing each other. The first is likely due to inflation and the second due to increases in population. But because we know population generally increases over time and inflation rises over time, we are able to say with some degree of certainty that if we know that population is increasing, we can predict based on that fact that inflation is rising. And we'd likely be right.

This is true for baseball statistics. A player with many home runs will also likely have many RBI. While home runs often result in RBI, the player doesn't have a high RBI count solely because he hits home runs and he certainly doesn't have many home runs because he has many RBI. But we can pretty safely say that if a player has many home runs that he has many RBI, and with a slightly less certain air we can say that a player with few RBI probably has few home runs.

So we can use these correlations to tell us information about the value of certain statistics. Sabermetrics has shown fans that Bill James' fabricated statistic, "runs created", and the more common statistic, "On base percentage plus slugging average", are valuable measures of a player's offensive ability to produce runs and thus help his team win. But these are difficult to calculate and somewhat obscure statistics compared to the easier to use and more commonly used "Triple Crown statistics", batting average, home runs and runs batted in.

But are these common statistics, which supposedly tell us how good a hitter a player is, really valuable measures? Obviously they tell us something. They tell us how often a player can get a hit (of any type) when he has an at bat. And they tell us how many players already on base that a hitter can cause to score with his hits. But they don't tell us quite so much about the players' ability to produce runs.

The correlation coefficients of these statistics with the more "valuable" statistics follow:

	BA	HR	RBI
On Base + Slugging	.35	.57	.55
Runs Created	.49	.44	.45

Before reading too much into these correlations, I should note that these figures probably aren't perfectly accurate. They encompass only the statistics of the players with the most at bats in 2004 and 2005. And the sample size is fairly small – just 150 batters. So these probably aren't totally right.

But assuming they are fairly close, we can see that none of the Triple Crown stats are that accurate in telling a player's run producing ability. Together they give us a little more accurate picture, but still with no "r^2" over .57, the correlations don't seem to be that strong. And none of the stats seem to be more telling than the other two.

But here's some food for thought. The ten most positively correlative and seemingly most accurate commonly used statistics as far as their correlation with "runs created" are:

Stat	R^2
OPS	.95
Total Bases	.86
Slugging average	.80
On base pct.	.61
Batting average	.49
Runs scored	.46
RBI	.45
HR	.44
Hits	.33
Walks	.20