Baseball Judgments

Baseball Judgments | SportsLibrary.net
Recalculating a tainted record
 
By Roger Weber
 

Current Highest single season home run totals:

 

Bonds, 73, 2001

McGwire, 70, 1998

Sosa, 66, 1998

McGwire, 65, 1999

Sosa, 64, 2001

Sosa, 63, 1999

Maris, 61, 1961

Ruth, 60, 1927

Ruth, 59, 1921

Foxx, 58, 1932

Greenberg, 58, 1938

McGwire, 58, 1997

Gonzalez, 57, 2001

Rodriguez, 57, 2002

Griffey Jr., 56, 1998

Griffey Jr., 56, 1997

Wilson, 56, 1930

Kiner, 54, 1939

Mantle, 54, 1961

 

It has been called the most coveted record in baseball. The single season home run record stood at 60 for 34 years, then at 61 for the next 37. Since then that count has been exceeded six times. This spike in home runs has caused many to feel the home run has become a stat totally unique today from what it once was. The single season home run record has been devalued by many fans who feel that steroids have destroyed the home run.

 

Steroids, though, aren't the only variable that can add to or subtract form a player's home run total. There has been a steady rise in total home runs in the major leagues since the beginning of the twentieth century. This can be expected. As more players join the game, the talent is sure to improve. Equipment improves, style of play improves and more is done to make the game more exciting, which often means adding more home runs.

 

Four major events have drastically affected home run totals. Prior to the 1930 season, baseballs were made lighter as an attempt to intentionally increase home run production. The total number of home runs rose at different rates after the change, depending on the time period used to compare. These differences average to about a 6% rise in home run production on top of the natural rise. I got this figure basically by finding the percentage difference over and under the linear model of natural home run rise described later for the ten years prior and ten years after the re-weighting of the ball, giving more emphasis to years closer to 1930. This makes older home run hitters' totals more impressive. It is interesting to note, though, that in 1931 a drastic dip in National League home runs occurred. Some suspect this may be due to secret changes to the ball. This change should be balanced, though, through a statistic described a few paragraphs later.

 

In 1947, perhaps the most important change to baseball ever occurred. Blacks began playing the game. By 1957 they made up 11.5% of major league rosters. Based on comparisons between the non-war years leading up to 1947 and the years shortly after 1947 based on percentages of minorities in the game, I have determined that adding minorities to the game has increased the quality of play by about 13%. For example, in 1954 20% more home runs were hit than during a comparable year before integration. Of course, these calculations measure the amount of home runs hit over the amount naturally predicted by the progression of baseball itself. And they do not include post-1960 results that were skewed by other factors. At that point, blacks made up 7% of the majors. This means that pre-1947 play is theoretically only about 88% as strong as baseball when minorities make up a decent percentage of rosters.

 

Two big events happened in 1961. The regular season was lengthened from 154 to 162 games, giving a slight benefit to modern players as far as accumulating records like home run totals. Also in 1961, the leagues were increased in size, spreading out pitching talent, which caused home run production to rise. Each time the league expands, home run production tends to rise.

 

The later of these two 1961 changes can be mapped and applied to other confounding variables and league expansions by using the linear regression model for total home runs hit in per team per year. During a year like 1961, when 8% more home runs were hit than predicted, there were clearly variables that caused totals to rise. In this study, home run totals are multiplied to make up for these variables.

 

Another factor that greatly influences a home run total is a player's home ballpark. Since 81 games are played in that park, a player benefits greatly if fences are short.

 

The study

 

The purpose is to put each of the top home run totals on a balanced scale so they can be effectively compared between years. The study is not exact and does not account for every variable. That must be expected. There will never be two equal years and thus it does not make sense to account for every possible variable. It can also be expected that the total number of home runs hit in the major leagues rises at a fairly linear rate. The r^2 for that linear graph is very high. 

 

On the next page is a graph of total number of MLB home runs per year. This graph does not account for differences in the number of teams in the league, but does show the linear relationship. An exponential model has an even higher correlation coefficient, but that appears to be just because of the recent rise in home runs, which most attribute to increased steroid use and smaller ballparks. For totals pre-1990 a linear model works best.

 

But for adjustments in the study, four separate graphs are taken for time in between the most noticeable shifts based on variables described above. Each home run total factored in this comparison is adjusted by the percentage of home runs hit versus home runs predicted by these individual models and by the overall linear model. This should make up for confounding variables. Adjustments are made for years during the shifts.

 

Click here to download report with graphs.

 

Following are the four equations of the linear models for the eras of home run production:

 

Years

Equation

1900-1929

y = 31.634x – 59995

1930-1941

y = 17.098x – 31731

1947-1960

y = 53.354x - 102240

1961-2005

y = 64.677x - 124703

 

Each of the variables described earlier in the study is measured.

 

The park factor is measured for home games only. Predicted home runs per team divided by actual home runs per team is used. Other adjustments are made for season length, integration of races and adjustments to the baseball. Each total is altered, either raised or lowered, based on whether it is higher or lower than whether it gave that player benefit or harm compared to the norm today. Theoretically, in the end totals are based on what players would have during a neutral situation in the modern game. 

 

Also, a measurement is taken using a quadratic model of each player's career excluding the year of his large home run total and any other years in question for injuries, steroid use, etc., predicting the home run total he would have had based on the other years of his career.

 

The table below includes the percentages each stat affects the player's home run total to balance between years. At the right side of the table is each player's predicted total home runs for the year in which they made the list for top single season home run totals and the highest total that would be reasonable expected by that player given the variation between his own totals during other years of his career. Several players exceed that total, which is understandable. It just means his total for that year was unusually high and can be attributed to many variables.

 

 

 

 

 

 

 

 

 

Table: predicted

 

 

 

 

 

 

 

 

total for career

 

 

 

 

 

 

 

 

and 95th%

Table: Variables included and (total x value) to yield result

 

possible total

player

homers

year

BPK / 2

HR for yr.

adj. seas.

race adj.

ball adj.

career pred

career max

 

 

 

 

 

 

 

 

 

 

Bonds

73

2001

1.0485

0.864

1

1

1

33

46

McGwire

70

1998

1.005

0.862

1

1

1

40

59

Sosa

66

1998

0.985

0.862

1

1

1

40

48

McGwire

65

1999

1

0.83

1

1

1

38

58

Sosa

64

2001

1.02

0.864

1

1

1

39

47

Sosa

63

1999

0.966

0.83

1

1

1

40

48

Maris

61

1961

1.025

0.922

1

1

1

27

40

Ruth

60

1927

0.985

1.047

1.0519

0.88

1.06

44

57

Ruth

59

1921

1.01

0.826

1.0519

0.88

1.06

37

47

Foxx

58

1932

0.98

0.974

1.0519

0.88

1

39

52

Greenberg

58

1938

0.97

0.995

1.0519

0.88

1

27

49

McGwire

58

1997

1.005

0.897

1

1

1

40

58

Gonzalez

57

2001

0.97

0.864

1

1

1

23

36

Rodriguez

57

2002

0.94

0.867

1

1

1

49

56

Griffey Jr.

56

1998

0.971

0.862

1

1

1

40

59

Griffey Jr.

56

1997

0.971

0.897

1

1

1

40

58

Wilson

56

1930

0.995

0.81

1.0519

0.88

1

29

40

Kiner

54

1949

0.98

1.09

1.0519

0.9

1

42

59

Mantle

54

1961

1.025

0.922

1

1

1

37

56

 

 

On the table below in bold is the calculation of the player's total if the variables already discussed are adjusted to create a balanced field over the years. Of course, that total does not account for variables like steroids. On the right side is the number of home runs more in the adjusted total than in the predicted value for that player at that time in his career. It is understandable that these are fairly significant numbers since most players' careers do not conform exactly to a quadratic model. The large differences, though, like Bonds', Sosa's, and McGwire's, are likely attributable to their use of steroids. Remember that the likely steroids years were eliminated from each player's figures used to create the quadratic model.

 

It is also interesting to see how high Roger Maris' and Luis Gonzalez' totals are compared to what the rest of their careers predicted. While Maris' can be attributed in part to the changes to the game that occurred in 1961 and while there is a change Luis Gonzalez' total was affected by steroids, these appear to be simply stellar years by good baseball players.

 

 

 

# over

# over

 

avg. var.

pred.

career

career

 

 

total

max

pred.

 

0.9125

66.613

-20.61

33.613

Bonds

0.867

60.69

-1.69

20.69

McGwire

0.847

55.902

-7.902

15.902

Sosa

0.83

53.95

4.05

15.95

McGwire

0.884

56.576

-9.576

17.576

Sosa

0.796

50.148

-2.148

10.148

Sosa

0.947

57.767

-17.77

30.767

Maris

1.0239

61.437

-4.437

17.437

Ruth

0.8279

54.051

-7.051

17.051

Ruth

0.8859

51.385

0.615

12.385

Foxx

0.8969

52.023

-3.023

25.023

Greenberg

0.902

52.316

5.684

12.316

McGwire

0.834

47.538

-11.54

24.538

Gonzalez

0.807

45.999

10.001

-3.001

Rodriguez

0.833

46.648

12.352

6.648

Griffey Jr.

0.868

48.608

9.392

8.608

Griffey Jr.

0.7369

41.269

-1.269

12.269

Wilson

1.0219

55.185

3.8148

13.185

Kiner

0.947

51.138

4.862

14.138

Mantle

 

 

Several of the top totals after adjustment still belong to the big three- McGwire, Sosa and Bonds. If these three were genuinely clean home run hitters, their feats could be recognized as incredible. If the steroid allegations are truly false, then the totals in bold are the final numbers for this study. Bonds, McGwire and Sosa would make up the top six. Most evidence, though, points to steroid use. The fact that the big three mentioned above have totals that so deviate from the pattern of their early-career home runs supports the claims so widely publicized.

 

Until the extent to which they used steroids, and the extent to which the steroids affected their play can be made open and clearly calculated, it is my contention that for the purposes of a mathematical study, there needs to be a list in addition to the bold totals above that eliminated the characters in the scandals.

 

If they are included in the list, the following is the adjusted list of the top single season home run totals:

 

Rank

Player

Actual

Year

Adjusted

1

Bonds

73

2001

66.6125

2

Ruth

60

1927

61.43688

3

McGwire

70

1998

60.69

4

Maris

61

1961

57.767

5

Sosa

64

2001

56.576

6

Sosa

66

1998

55.902

7

Kiner

54

1949

55.18519

8

Ruth

54

1920

54.05119

9

McGwire

65

1999

53.95

10

Mantle

52

1956

52.676

11

McGwire

58

1997

52.316

12

Greenberg

58

1938

52.02299

13

Foxx

58

1932

51.38499

14

Mantle

54

1961

51.138

15

Mays

52

1965

50.232

 

When the steroid suspects are removed, the list looks like this:

 

Rank

Player

Actual

Year

Adjusted

1

Ruth

60

1927

61.43688

2

Maris

61

1961

57.767

3

Kiner

54

1949

55.18519

4

Ruth

59

1921

54.05119

5

Mantle

52

1956

52.676

6

Greenberg

58

1938

52.02299

7

Foxx

58

1932

51.38499

8

Mantle

54

1961

51.138

9

Mays

52

1965

50.232

10

Ruth

54

1920

48.84894

11

Griffey Jr.

56

1997

48.608

12

Mays

51

1955

47.98835

13

Kiner

51

1947

47.78435

14

Gonzalez

57

2001

47.538

15

Killibrew

49

1964

46.844

So Ruth wins out. Of course, Ruth played in an era that is difficult to compare to even the era three decades after he played. For percentage of total home runs hit by one man, Ruth is far and away at the top of the list. His 1921 total, in my mind, should actually be considered the most impressive ever because he did eclipse so many teams in the league and so greatly overshadowed all other players in the game at that time. Maris' one great year, though, also stands out well above many of the other contenders. Ralph Kiner had most of his best seasons in what some call the best era ever for baseball (1947-1961) because it included integration and was before the league expanded.

 

These totals may seem low. This is due much in part because more home runs are lost due to the variable adjustment than are gained. Much of this is based on the assumption that the recent rise in home runs is just that- a rise- caused by steroids, small ballparks and other factors. Some may fault the study for this, but the top home run totals are likely helped more than hurt by lurking variables the same way that this formula would also lift the lowest home run totals. It is only natural that the highest totals come from players in the smallest parks during the longest seasons during the weakest pitching eras. These effects are negated by these rankings. On the other side, though, players who play in pitchers' parks during shortened seasons see their totals rise when adjusted.

 

A few other effects aren't accounted for by the rankings. There was no easy or clear way to determine the differences caused by night games without sifting through many box scores. It is also difficult to give a value to lineup position or equipment quality.

 

Like all baseball questions, there is no clear answer and plenty of room for debate. In the coming years more light should be shed on the importance of steroids to the recent game. Whenever there is a scandal, trust is lost, even in a baseball record. Ultimately, though, the great players should stand out. Their performances are inspiring to millions.

 

--

 

Click here to download the study with graphs

 

--

 

"It's a long drive…the Giants win the pennant, the Giants win the pennant…"

Enter supporting content here

________________________________________________________________________________________________________________________________