Sport | Tanulmányok, Esszék » Josh Meltzer - Average Salary and Contract Length in Major League Baseball, When Do They Diverge

Alapadatok

Év, oldalszám:2005, 40 oldal

Nyelv:angol

Letöltések száma:2

Feltöltve:2012. szeptember 30.

Méret:665 KB

Intézmény:
[STA] Stanford University

Megjegyzés:

Csatolmány:-

Letöltés PDF-ben:Kérlek jelentkezz be!



Értékelések

Nincs még értékelés. Legyél Te az első!

Tartalmi kivonat

Source: http://www.doksinet AVERAGE SALARY AND CONTRACT LENGTH IN MAJOR LEAGUE BASEBALL: WHEN DO THEY DIVERGE? Josh Meltzer * Stanford University Department of Economics May 2005 ABSTRACT This paper investigates the ways that various measures of player performance have different impacts on the outcomes of salary determination and contract length in Major League Baseball using contract data from 2002. Because average salary and contract length are jointly determined, a two-stage least squares is used to estimate each as a function of the other and various performance metrics. The results show two primary areas of divergence for contract length and average salary. The first comes from young improving players who are likely to get long-term contracts at low annual salaries. The second comes from players with chronic injuries, whose salary is not affected by their injuries but who will tend to get shorter contracts than they otherwise would. This second effect was not apparent from the

first-stage regression, demonstrating the importance of the two-stage least squares methodology for analyzing contract length and average salary. * I would like to thank my adviser Roger Noll for his valuable advice and guidance throughout this process. I would also like to thank John Abbamondi for comments on an earlier draft and Edward Vytlacil and Geoffrey Rothwell for their helpful suggestions about the empirical model. All the errors are my own Source: http://www.doksinet Josh Meltzer, May 2005 2 INTRODUCTION Major League Baseball provides a unique opportunity for examining the behavior of employers in offering contracts because of the wealth of data about the individual performance of players. Whereas for most industries it is very difficult to gauge the value of a worker and to compare one worker to another, every action of a baseball player is documented and factored into the employer’s decision. Researchers have paid a significant amount of attention to what

determines player salary, especially with the advent of free agency after the Basic Agreement in 1976. Much less attention has been devoted to the other half of baseball contracts: the number of guaranteed years. In this paper, I will investigate what determines the length of player contracts in Major League Baseball (MLB). Typically, salary and contract length rise together, but what are the exceptions to this relationship? For what types of players would we expect to see long-term contracts for a relatively low average salary? For what types of players would we expect to see short-term contracts for large average salaries? What are the factors that are affecting salary and length differently? These are all interesting questions that have seen little scholarly attention up to this point. Unlike in other major sports leagues like the National Football League, MLB contracts are guaranteed. 1 Players must be paid by their teams even if they do not perform up to expectations or if they

are injured and cannot play. If a player is released, the team must still pay 1 Many contracts also make use of various forms of options. There are club options, which are like call options for a team on the player. Player options allow the player to have control over whether he returns at a previously agreed salary. There are also mutual options, to which both the player and the team have to agree Many long-term contracts have options attached at the end, after a number of guaranteed years. However, there is usually just one option year, with rare cases of two option years. The options take various forms, and are sometimes automatically triggered if the player achieves certain performance incentives. Although options have become more prevalent in recent years, most players in Major League Baseball are still playing under a guaranteed contract. Source: http://www.doksinet Josh Meltzer, May 2005 3 the player’s salary unless another team picks up the player off of waivers, in

which case the new team is responsible for paying the balance of the contract. The guaranteed nature of contracts means that baseball teams will be especially careful to make sure that the players they are signing are worth the investments that they are making. The process of salary determination in baseball has received a considerable amount of attention from researchers. The motivations of both negotiating parties are pretty clear with respect to salaries. In general, players want to be as highly paid as possible, while the teams want to get the best possible talent for the least amount of money. The bargaining process will depend on the bargaining status of the player, which will be addressed later in this paper. As a general rule, however, we would expect the players to ask for high contracts, the teams to offer lower ones, and the outcome to be somewhere in the middle. Baseball teams are trying to win games, and they will try to allocate their payrolls in a way that will allow

them to field the best team. The best players will be able to help the team the most, and they will command the highest salaries. I will estimate a salary equation during the course of this analysis, but that is not the primary focus of this paper. Contract length is an equally important consideration for both the player and the team but it has received much less scholarly attention than salary size. The logic of contract length is not quite as straightforward as salary determination. Both the teams and the players are trying to manage their respective risks, and many competing factors go into determining the length of a contract. There is an enormous amount of uncertainty in the market for baseball players, and any analysis of baseball contracts must first consider how both teams and players are managing the risks associated with this uncertainty. Source: http://www.doksinet Josh Meltzer, May 2005 4 Performance and Market Uncertainty There are two major sources of uncertainty:

performance and market uncertainty. Unlike the performance of a manufacturing employee, for example, which one would expect to remain relatively constant over a number of years, perhaps improving with experience, the performance of Major League Baseball players exhibits substantial variation. Players who are All-Stars in one year may not be the next year, and vice versa. Many players have a period of one season or several seasons when their performance varies dramatically from their career averages. There are two major explanations for performance uncertainty. The first is that each player has a constant but unknown underlying level of performance, and that any changes in performance are mere statistical fluctuations around this level. For example, let us suppose that a given player has a 3 in 10 chance of getting a base hit in any individual at-bat. We would expect that player to be a career .300 hitter However, we would expect to see fluctuations around this mean level. On a

short-term basis, we would expect to see substantial variability in the player’s performance. On a given day, the player may go 0 for 5 or 5 for 5 at the plate, but the underlying level remains the same. As the time period increases, we would expect to see a more constant level of performance. However, even over the course of a season, we may see substantial fluctuations in player performance that are simply due to chance. Albert and Bennett (2001) ran a simulation imagining that a player’s true on-base percentage was .380 Over the course of 100 seasons, there was an 88-point differential between the player’s best and worst seasons, assuming that the underlying level remained constant. As this illustrates, random chance is a major factor affecting variation in player performance. The second explanation for performance uncertainty is that the player’s underlying level of performance changes over the course of his career. Some of this may be predictable One Source:

http://www.doksinet Josh Meltzer, May 2005 5 would expect that experience would cause players to improve over the early parts of their careers and then decline as they age. However, other less predictable factors can also affect a player’s underlying level of expected performance. These include the effects of injury, coaching and personal upkeep. Injury provides a powerful explanation for the uncertainty of player performance. Many players are forced to miss games during the season because of physical injuries sustained on the field. Injury history can be useful in attempting to predict the likelihood of a player becoming injured in the future, but there remains a significant element of chance. Some types of nagging injuries, like a shoulder injury for a pitcher, recur many times and can be taken into account when negotiating contracts. Other injuries are flukes, as when a player breaks his leg diving for a ball. These are the types of injuries that can happen to any player

regardless of physical fitness. Ken Griffey Jr’s career provides a compelling case study of the possible effects of injury. In February, 2002, he signed a nine-year $1165 million deal with the Cincinnati Reds after ten consecutive All-Star appearances with the Seattle Mariners. Over the three years before he signed his contract, he averaged more than 159 games played and more than 53 home runs per season. He signed his contract at the age of 30, generally considered the beginning of a player’s prime. In the five injury-plagued seasons since then, he has averaged just 92 games played and 21 home runs per season. We would not expect injuries to affect only the number of games played, however. It is generally assumed that player health, although difficult to measure, has a tremendous effect on performance. Even if a player does not actually land on the disabled list (DL) and miss games, a decline in performance may be due to a more mild injury. Over the course of a 162-game season,

players sustain many mild injuries or experience mild recurrences of more serious past Source: http://www.doksinet Josh Meltzer, May 2005 6 injuries. Players may be reluctant to report these injuries in order to avoid being placed on the DL. Even if they are placed on the DL, their performance in the periods before being placed on the DL and after coming off the DL may be limited by the effects of the injury. Frequently, the extent of an injury can be accurately determined only through honest feedback from the player, so these types of injuries can be kept secret and are virtually impossible to document and analyze in any rigorous manner. Another possible explanation for fluctuation in player performance is changes in the personal behavior of the player over the course of a career. This behavior may fall into several different categories. One may be changes in technique through coaching suggestions that could cause better or worse performance. Another would be whether the player

stays in good physical shape. Use of alcohol or recreational drugs could also affect performance Finally, the factor that has received significant attention recently is the use of performance-enhancing drugs. An anonymous survey testing regimen began in 2003, and 2005 was the first season with drug testing that led to automatic suspensions for players. However, given that my data is from before 2003, there are no hard numbers about the prevalence of steroid use among the players in my sample. Anecdotal evidence tells us there was at least some steroid use, and allegations have followed players who have shown large increases in power numbers from one year to the next. Ken Caminiti, who won the National League Most Valuable Player award in 1996 in a season in which his power numbers surged, later acknowledged using steroids during that season. It is impossible to know how widespread steroid use is or to accurately gauge the impact of these drugs, but it may be another factor that causes

variation in player performance. Ultimately, the performance of players is fundamentally uncertain. Past performance is a good predictor of future performance, but it provides no guarantees. The fundamental Source: http://www.doksinet Josh Meltzer, May 2005 7 uncertainty of player performance is something that must be taken into account in contracts. Both the players and the teams face competing risks. If players sign short-term contracts, they risk getting injured and being unemployed in the future. Because even the minimum salary in baseball, at $300,000 in 2003, is much higher than the average salary in the country as a whole, loss of employment for a player is likely to lead to a very large loss in income. However, if players sign long-term contracts, they lose the opportunity to sign for more money in the future if their performance improves. Teams face the opposite set of risks If they sign a player to a short-term deal, they risk having the player improve and being forced

to either sign that player to a higher contract in the future or have the player leave for another team. If they sign a player to a long-term deal, they risk having the player get injured or having his performance decline and being forced to continue to pay that player. Traditionally, researchers have assumed that, for players, the risk of short-term deals is greater than the risk of long-term deals. Given the risk of career-ending injuries and the enormous fall in income that such an injury would entail, players want to guarantee a future stream of income. However, teams have a portfolio of players with which to diversify their risk and thus are more willing to bear this performance risk than players (Lehn 1982). Consequently, they are in a position to offer players additional years in exchange for a lower contract. However, they will still tend to offer shorter contracts to players with higher perceived performance uncertainty (Maxcy 2004). It is important to note that teams may act

differently towards the risks associated with performance decline from injury as opposed to from other causes. Teams are able to protect themselves against the risk of injury by purchasing insurance. Teams often purchase injury insurance for their players, especially for the most expensive ones. It has recently become more Source: http://www.doksinet Josh Meltzer, May 2005 8 difficult to get insurance policies that cover very long contracts, but teams can purchase a new insurance policy after the first one expires (although the new policy will have new premiums based on more recent injury history). Typically, the insurance policy pays the player’s salary, or a portion of the salary, if the player is on the DL. In this way, teams are insured against player injury. Fluctuations in performance, however, are not covered by these policies and thus may be a greater source of financial risk for the teams. Maxcy (2004) hypothesized that better players will tend to get long-term

contracts because they are less risky than worse players. This is not because the level of their play fluctuates less but because even a downswing would probably keep them above a certain “replacement level” at which teams would want to substitute a different player but could not because of being saddled with a long contract. This logic is sensible, although evidence from baseball suggests that teams do not always follow it. Paul DePodesta, the general manager of the Los Angeles Dodgers, said: A very small percentage of the players in the big leagues actually are much better than everyone else, and deserve to be paid the millions. A slightly larger percentage of players are actually worse than players who are stuck in the minors, but those guys usually arent the ones getting the big money. Its the vast middle where the bulk of the inefficiency lies -- the player who is a known player due to his major-league service time making millions of dollars who can be replaced at little to no

cost in terms of production with a player making close to the league minimum (quoted in Lewis, 2005). Here, DePodesta suggests that many players who are below the replacement level are being kept in the Major Leagues. DePodesta is part of a new wave of general managers who rely heavily on statistics (he was a Harvard economics major). Theo Epstein of the Boston Red Sox, JP Ricciardi of the Toronto Blue Jays, and Billy Beane of the Oakland Athletics are examples of other general managers who place a heavy emphasis on statistical analysis. These new general managers may be trying to clear up many of these inefficiencies, but for now there remain Source: http://www.doksinet Josh Meltzer, May 2005 9 substantial inefficiencies in the way contracts are assigned. DePodesta is discussing salary, but one would assume that the same inefficiencies are present with respect to length of contracts. This does not mean that Maxcy’s intuition about replacement level is invalid, but it may not be

widely considered by teams in signing players to contracts. The comfort of “known” players remains important. The second source of uncertainty is market uncertainty. While some players in baseball are easily replaced by others from the minor leagues or from other teams, some players are much more difficult to replace. These are likely to be the very best players, especially the very good players at tough defensive positions like catcher or shortstop. These players are uniquely skilled, and thus cannot be easily replaced or substituted for one another. Because the MLB draft and the minor league system by extension are not always reliable in predicting the success of players at the Major League level, it is not easy to guess what the market for players will look like a few years down the line. In this way, signing a shortstop now to a long-term deal may look like a good deal if there are no comparable shortstops available in the future, or could look foolish if a strong young crop of

shortstops comes up from the minor leagues. This market uncertainty affects the teams differently from performance uncertainty. Whereas higher performance uncertainty makes teams less likely to offer long-term contracts, higher market uncertainty makes teams more likely to offer long-term contracts (Maxcy 2004). Maxcy found, interestingly, that low-revenue clubs are more likely to offer long-term contracts than middle- or high-revenue clubs. He hypothesized that low-revenue teams are more risk averse to market uncertainty and thus hope to lock up good players for a longer period of time. Market uncertainty may also vary by position. Given the uncertain pool of catchers or shortstops, we might expect to see longer contracts for these players. Ultimately, players and Source: http://www.doksinet Josh Meltzer, May 2005 10 teams may weigh the respective risks differently in different situations, and it is difficult to reach a clear theoretical conclusion about how players and teams

will act. Empirical results can help to reveal how they are acting in practice. LITERATURE REVIEW Much of the scholarly research on baseball came in the aftermath of the signing of the Basic Agreement in 1976 which allowed players to become free agents for the first time. Free agency essentially allowed players to receive the market value for their services, as opposed to the old system when they were forced to accept whatever their teams chose to pay them. This sudden change gave economists an opportunity to analyze the effects of allowing the free market to determine contracts and they jumped at the opportunity. A great deal of research has gone into investigating the effect of free agency on the size of player salaries (Chelius and Dworkin, 1982; Hill and Spellman, 1983; Raimondo 1983). This research has universally found that the advent of free agency led to a substantial rise in salary. These studies focused exclusively on salary, however, and ignored the importance of contract

length. More recently, several researchers have investigated the length of baseball contracts. Kahn (1993) used a fixed effects approach on longitudinal data to look at the effect of arbitration eligibility and free agency on salary and contract length. He found that both arbitration eligibility and free agency raise average salary, but only free agency raises contract duration, while arbitration eligibility has no effect. There are several reasons to try to expand on his results, however. His vector of explanatory performance metrics relied entirely on career averages, which are probably less good predictors of performance than more recent measures Source: http://www.doksinet Josh Meltzer, May 2005 11 like three-year averages. Furthermore, in Kahn’s data, the average contract duration was 131 years. My more recent data has an average duration of 179 years, even excluding contracts over five years. The increase in contract length in the decade since Kahn’s work suggests that

there may have been changes in the process of contract negotiation over that period. Additionally, Kahn limited his focus to the effect of free agency and arbitration eligibility and did not investigate whether there are other factors that affect salary and contract duration differently. Finally, Kahn ran separate regressions for whites and blacks, although research has not shown any evidence of racial discrimination in baseball salaries (Kahn 1991; Sommers 1987). Maxcy (2004) used a binary choice probit model to determine which players are getting long-term contracts. He regressed length of contract on a host of performance metrics, including slugging percentage. Maxcy found that players who are the most likely to be replaced are the least likely to get long term contracts. These include older players, who will deteriorate more quickly, and mediocre players who are more likely to fall below the replacement level. There are a couple of potential problems with his analysis, however.

First, the binary choice probit model indicated only whether a player got a long-term deal but did not distinguish between those deals. Given that a two-year deal is probably more similar to a one-year deal than to a five-year deal, this methodology seems somewhat lacking. Second, he did not include salary in his regression, which is likely to be an important factor and jointly determined, as I will discuss later. Krautmann and Oppenheimer (2002) addressed the interaction of contract length and salary more explicitly. They looked for a compensating effect between salary and length Given that players tend to prefer both larger and longer contracts, Krautmann and Oppenheimer hypothesized that players will accept a tradeoff between the two. In order to test this empirically, they regressed salary on contract length and other factors, using a two-stage least Source: http://www.doksinet Josh Meltzer, May 2005 12 squares to overcome the potential endogeneity problem between salary and

contract length. They also limited their sample to free agents in order to avoid dealing with players at different levels of bargaining power. They found that longer contract length reduces the monetary return to performance. In other words, players are willing to take a salary reduction to guarantee a longer contract. The existence of this compensating effect demonstrates the importance of including length in the salary regression. My paper adds to the existing literature in several ways. Most importantly, it takes Krautmann and Oppenheimer’s two-stage least squares model and applies it not only to the regression of salary on length but also to the regression of length on salary. Using salary as an explanatory variable in the length regression has not been done before. By comparing the second-stage results to the first-stage results and to the results of other researchers, we can learn about the importance of including salary as an independent variable in the length regression. We

can also investigate the factors that affect length over and above their effects on salary. My model will also be able to distinguish between long-term deals instead of just noting long-term versus short-term, and I will be able to extend Kahn’s findings to see what factors besides free agency and arbitration act differently on salary and length. Finally, none of these researchers used any measure of injury history in their regressions. Injury history is likely to be very important to teams, especially in determining the length of contract, and my analysis will look closely at the effects of injury history. Source: http://www.doksinet Josh Meltzer, May 2005 13 METHODOLOGY Data I decided to limit my research to hitters. The primary reason for this is that pitching statistics are less universal than hitting statistics. There are two main types of pitchers, starting pitchers and relief pitchers. Within relief pitchers, there are middle relievers and closers Although hitters are

composed of many different position players, at the plate they are all measured the same way. For pitchers, statistics like wins, innings pitched, and saves are highly dependent on the type of pitcher. Krautmann, Gustafson, and Hadley (2001) found that pitchers cannot be aggregated together. However, to run different regressions on different types of pitchers would lead to a dangerously small sample and less universal results. Furthermore, many pitchers act as both relief pitchers and starters over the course of a season, requiring an arbitrary cutoff to categorize them as one or the other. Given these difficulties with pitchers, I have opted to focus my analysis on hitters. The data consist of all contracts of hitters as of the end of 2002. Although 73% of the contracts were signed in 2002, they go back as far as 1997. The data were gathered from a variety of sources. 2 I dropped any players who had no Major League experience as of 2002, since they had no performance history at the

Major League level. I also dropped any players with contracts greater than five years, since these outliers might skew the overall results. There are three main categories of data: contract details, player characteristics and team characteristics. 2 The contract data were taken from the website, http://www.bluemancdemoncouk/baseball/mlbcontractshtm This website has since been taken down, but can still be accessed through this website, http://www.archiveorg/web/webphp The contract information was confirmed by looking at newspaper reports of contract signings when available. The data on days on the disabled list were generously provided by Major League Baseball. The performance statistics were taken from The Baseball Archive Database Version 52, which is available at http://www.baseball1com, and from the Baseball Guru database, which is available at http://baseballguru.com/bbguruolhtml A sampling of the statistics were confirmed with data from ESPNcom to ensure accuracy. Age was

calculated as of opening day of 2003 based on the dates of birth available on ESPNcom Payroll information was acquired from the USA TODAY Salaries Database, available online at http://asp.usatodaycom/sports/baseball/salaries/defaultaspx Source: http://www.doksinet Josh Meltzer, May 2005 14 Table 1: Variable Definitions AVGSAL Average salary over the duration of the guaranteed contract, including guaranteed bonuses and option buyouts LENGTH Number of guaranteed years in player’s contract OPSAVG Average of on-base plus slugging percentage over the three years prior to signing contract OPSCHANGE Change in OPS in final year before contract over OPSAVG PAAVG Average number of plate appearances over the three years prior to signing contract PAUP Dummy variable indicating if the plate appearances in the final year before the contract increased by more than 100 over PAAVG ALLSTAR Number of All-Star selections in three years before signing contract GOLDGLOVE Number of Gold Gloves won in

three years before signing contract DLFEW Average number of days spent on the disabled list over the three years before signing contract if the average is less than or equal to 15 days or 0 if the average is greater than 15 days DLMANY Average number of days spent on the disabled list over the three years before signing contract if the average is greater than 15 days or 0 if the average is less than or equal to 15 days HEALTHY Dummy variable indicating if the player spent 0 days on the disabled list in the year before signing contract AGE Player’s age as of opening day on the first year of new contract 2 AGE Player’s age squared CATCHER Dummy variable indicating if the player is a catcher SHORTSTOP Dummy variable indicating if the player is a shortstop OF Dummy variable indicating if the player is an outfielder FREEAGENT Dummy variable indicating if the player was a free agent when the contract was signed ARBITRATION Dummy variable indicating if the player was arbitration eligible

when the contract was signed HIPAY Dummy variable indicating if the team’s payroll was in the top five LUX Dummy variable indicating if the team’s payroll was in the top five and the contract was signed after the 2002 season LOPAY Dummy variable indicating if the team’s payroll was in the bottom five POP Population of team’s metropolitan area as of 2000 The contract details consist of the number of guaranteed years (LENGTH) and the average salary (AVGSAL) over those guaranteed years. Team options and mutual options were not counted, but player options were considered to be guaranteed years (Player options allow the player to decide whether to exercise the option and remain on the team, so they are even better than a guaranteed year from the player’s standpoint). Although team and mutual options were Source: http://www.doksinet Josh Meltzer, May 2005 15 not included, option buyouts and signing bonuses were factored into the total compensation in order to calculate average

annual salary. Player characteristics consisted of bargaining status, performance statistics, awards, position and age. A major determinant of the final outcome is the bargaining position of the player and the team. For the first three years of Major League service, players must accept the offers of the teams (one year of major league service is defined as 172 days on a Major League roster 3 ). After three years, players are eligible for salary arbitration, a process whereby the player and team each submit an offer, and a panel of three arbitrators chooses one or the other if the player and team do not reach an agreement beforehand (note that a small subset of outstanding players, known as Super Twos, are eligible for arbitration after their second year). The dummy variable ARBITRATION indicates arbitration eligibility. After six years of Major League service, players are eligible for free agency, indicated by the dummy variable FREEAGENT. As free agents, they can sign a contract with

any Major League team The player’s status largely determines his bargaining power with the team, which will have a significant effect on the outcome of contract negotiations. Krautmann and Oppenheimer (2002) limited their study to free agents in order to avoid dealing with the complications resulting from the different bargaining status of players. However, I decided to include all hitters in my sample, since one of the interesting aspects I want to investigate is which non-free agents may be getting long-term deals. More than any other sport, baseball is a game that provides a field day for statisticians. Since baseball was invented more than 150 years ago, statistics have been a fundamental part of 3 For more information about Major League service time, refer to the Basic Agreement 2003-2006, p. 77 Source: http://www.doksinet Josh Meltzer, May 2005 16 the game. 4 Basic statistics like batting average and runs batted in have given way to a host of advanced statistics, each

vying to encapsulate the productivity of a hitter in one number. Given the likely multicollinearity of most offensive statistics, it is sensible to use only one statistical measure of productivity. I have chosen to use on-base plus slugging percentage (OPS) On-base percentage measures how often a player reaches base, a combination mostly of walks and hits. Slugging percentage measures the number of total bases divided by total at-bats. It therefore takes power hitting into account. The sum of these two statistics provides one number which includes both hitting for power and ability to reach base, the two most important components of hitting. Slugging percentage has traditionally been a popular statistic to measure the productivity of hitters (Krautmann and Oppenheimer 2002; Maxcy 2004), but OPS is likely to be a better measure since it includes the on-base component. Albert and Bennett (2001) found that OPS produces a “far-superior” (p. 166) model for predicting team runs per game

than either slugging percentage or on-base percentage individually. Given that on the offensive end, teams are attempting to maximize the number of runs scored, it would follow that teams would value the OPS of individual players in the same manner. Students of the game could debate endlessly the single best statistic for measuring a hitter’s productivity. I have chosen OPS, which is generally agreed to be a very good measure. For those not familiar with the statistic OPS, it will be useful to look at the summary statistics from the data. The average OPS in my sample is 754 An OPS of over 1000 is typically considered an excellent year, and over .900 is still a very strong performance An OPS below .650 would be considered a poor hitting season The variable OPSAVG is a three-year average of OPS before the contract was signed. 4 For an excellent discussion of the history of statistics in baseball, read The Numbers Game: Baseball’s Lifelong Fascination with Statistics by Alan

Schwarz. Source: http://www.doksinet Josh Meltzer, May 2005 Table 2: Summary Statistics of Sample Variable Mean S.E AVGSAL 2.437 3.069 LENGTH 1.789 1.241 OPSAVG 0.754 0.107 OPSCHANGE -0.003 0.105 PAAVG 350.481 194.601 PAUP 0.237 0.426 ALLSTAR 0.209 0.512 GOLDGLOVE 0.081 0.389 DLFEW 3.078 4.488 DLMANY 35.728 18.098 HEALTHY 0.758 0.428 AGE 29.148 4.036 AGE2 865.829 244.648 CATCHER 0.165 0.372 SHORTSTOP 0.081 0.274 OF 0.338 0.474 FREEAGENT 0.433 0.496 ARBITRATION 0.290 0.454 HIPAY 0.155 0.363 LUX 0.112 0.316 LOPAY 0.160 0.367 POP 6.122 5.144 17 Source: http://www.doksinet Josh Meltzer, May 2005 18 OPS, however, provides no indication of the amount of playing time a player gets. A player who has an excellent OPS but plays only a limited amount of time is not likely to be attractive to teams. For this reason, I have included plate appearances as well Plate appearances is a somewhat ambiguous measure, since there

are really two factors that affect it. The first is whether a player is good enough to be starting regularly. Given that teams are trying to win as many games as possible, we would expect them to field their best team. Better players ought to receive more plate appearances. In this way, the number of plate appearances indicates the value a team is placing on the player. It is important to note that plate appearances also depend on the place in the lineup. Given that batters reach the plate in order one through nine, the difference between batting first and ninth in the lineup could lead to more than 100 additional plate appearances over the course of a 162-game season. In general, however, the better players bat earlier in the lineup, so they will tend to get more plate appearances by design (second hitters could be an exception to this rule, as they are sometimes weak hitters, but this should have little impact on these results). The existence of top-heavy lineups should only

strengthen plate appearances as a measure of player performance. The variable PAAVG is a three-year average of plate appearances. However, the other factor that can affect plate appearances is injury. Even if a player is good enough to play regularly, injury may limit the player’s plate appearances. The standard measure of injury is the number of days spent on the disabled list (DL). Injured players can be placed on the DL for a minimum of fifteen days in order to open up a roster spot for another player. However, one might expect that days on the DL would not have a linear effect on team’s assessment of players. For players that spend very little time on the DL, teams may largely ignore a short stay on the DL, which might just be a fluke. For players that spend a substantial Source: http://www.doksinet Josh Meltzer, May 2005 19 amount of time on the DL, however, each additional day that the player is injured is likely to have a greater effect on the team’s assessment of the

player’s future health. For this reason, I created a piecewise function for days on the DL, using the two variables DLFEW and DLMANY. DLFEW represents days on the DL if the player averaged less than 15 days on the DL over the last three years. I am expecting that DLFEW will not be significant for the length regression, as teams will largely ignore rare injuries. DLMANY represents the days on the DL if the player averaged more than 15 days on the DL over the last three years. I expect that DLMANY will be significant, reflecting teams’ concerns about chronic injuries. 5 The dummy variable HEALTHY indicates if the player spent no time on the DL in the year before signing his contract. None of these measures will pick up minor injuries that may cause a player to miss a few days in the lineup or adversely affect a player’s performance without requiring the player to sit out any games. These are less likely to affect a team’s decision to sign a player, however, since their primary

concern is likely to be chronic injuries that could cause a player to miss significant playing time and also because teams may not be aware of minor injuries if the player plays through them. I have included All-Star selection over the previous three years as another performance measure. All-Stars are selected because they are among the top performers at their position in the first half of the season. Fans select the All-Star starters, while managers selected the reserves until 2003. 6 Popular players are sometimes elected even when their performances do not seem to merit selection. However, this popularity is a factor that teams may take into consideration when signing a player’s contract, given increased attendance at games, etc., so All-Star selection 5 I also ran the regressions assuming that days on the disabled list had a linear effect on contract length and salary, and although the results held using either method, the results were stronger using the piecewise disabled list

function, as would be expected from the manner in which teams evaluate players. 6 Starting in 2003, players voted to select the All-Star reserves. All of my data is from before 2003, though, so this is not relevant to my research. Source: http://www.doksinet Josh Meltzer, May 2005 20 should be a powerful predictor of contract size and length. All-Star selection also takes into account the total player package, including speed and defense and perceived leadership qualities, all of which are attractive to teams but difficult to include as independent variables, for reasons I will discuss later. Another equally important consideration is how far back to consider past performance. Do teams value consistent performance or are they satisfied with a breakout contract year? In order to measure consistent performance, I have included three-year averages of performance statistics, which is fairly standard in baseball research (Krautmann and Oppenheimer, 2002). I have also included OPSCHANGE

as the deviation from OPSAVG in the last year of the contract. Additionally, I have included a dummy variable, PAUP, which indicates if plate appearances increased by more than 100 in the final year over PAAVG. Together, these will provide both a measure of a player’s historical averages and his trend in performance, both of which are likely to matter to teams signing him to a contract. However, PAUP may also be useful in isolating a particular class of players that may be likely to get relatively long-term contracts at relatively low annual salaries. These would be young improving players. These players are not likely to get large salaries, often because they are not in a strong bargaining position. If the player is arbitration eligible, the team knows that it can sign the player to a one-year deal for a reasonable price. 7 However, teams may have an incentive to guarantee long contracts for two reasons. First, if they expect the player to improve significantly over the term of the

contract, they can lock that player in at a lower rate and avoid paying raises every time the player comes up for arbitration. Second, if they can sign the player 7 It is important to note that in practice, it is generally believed that some players get higher salaries under arbitration than they would as free agents. Although the economic theory would predict otherwise, arbitrators virtually always give players raises, while free agents sometimes take pay cuts. As a group, free agents still get substantially higher salaries than those eligible for arbitration, but it is important to note that free agency does not unambiguously raise a player’s salary over the salary that would have been given under arbitration. Source: http://www.doksinet Josh Meltzer, May 2005 21 to a long enough contract that it extends into one or two free agent years, the team has avoided the possibility of losing the player during those years and has also managed to negotiate a deal before other teams can

make competing bids to drive up the price. From the player’s perspective, we might expect young players to be more willing to accept longer deals at lower average salaries, since they have just reached the Major Leagues, often after a long period in the minor leagues, which features much lower pay and a difficult road to the Majors. Having reached the Major Leagues, we might expect them to accept a deal at under market value for their services if it will guarantee them a stay in the Major Leagues. After controlling for injuries that season, in order to make sure that PAUP is not picking up an improvement in health but not performance, PAUP ought to help us evaluate the way teams act towards young improving players in contract length determination. I have also included number of Gold Gloves (GOLDGLOVE) won over the last three years. A Gold Glove is awarded to the best defensive player at his position in his league This measure notes only the most exceptional fielders, whom we would

expect to be valuable to teams and to be compensated accordingly. Unfortunately, my dataset has nothing in it to distinguish good defensive players from mediocre or bad ones. The only widely reported fielding statistics are fielding percentage or errors per season, which provide only evidence of mistakes but do not take into account exceptional plays or range or other important factors. There are a number of more sophisticated fielding statistics that have been developed and are continuing to be developed, but many of these rely on subjective judgments and they are generally considered to be less reliable than offensive statistics. For this reason, I have not included those factors I have, however, included dummy variables for catcher (CATCHER) and shortstop (SHORTSTOP), as these are the two positions for which defense is considered most important. Source: http://www.doksinet Josh Meltzer, May 2005 22 There is of course a whole hierarchy of defensive importance, with center field

being more important than corner outfielders, and second base being more important than first base, and so on. I have isolated the two most important defensive positions, since I am guessing that these are the ones for which market uncertainty will play the biggest role. I have also included a dummy variable for outfielders (OF), as the outfield is considered a less defensively demanding position. However, because outfielders tend to be better hitters than infielders, the OF dummy may pick up some aspects of stronger performance that are not completely controlled for by my vector of performance metrics, so this coefficient should be interpreted cautiously. Lacking from my vector of performance measures is any measure of a player’s speed. OPS may pick up some minor aspects of speed, insofar as fast runners are more likely to beat out infield hits or stretch hits for extra bases. In practice, though, stolen bases are the only useful measure of speed. However, stolen bases are generally

negatively correlated with other variables like slugging percentage that are likely to lead to longer contracts and larger salaries. Although I have included a vector of performance measures, OPS and PAAVG cannot capture all of the aspects of strong hitting. As a result, stolen base coefficients may be correlated with these omitted variables and thus may not provide a useful measure of the value teams place on speed. For this reason, I have chosen to leave out any measure of speed in this analysis. Another potential source of value that is missing from my data is any measure of clubhouse qualities or leadership abilities that make a player a good teammate. “Team chemistry” is often cited as contributing to team success, but this sort of intangible is very difficult to quantify and analyze in any useful way. A look at the long contracts and very large salaries of two recently appointed team captains, Derek Jeter of the New York Yankees and Jason Varitek of the Boston Red Sox, would

suggest that teams do place a premium on Source: http://www.doksinet Josh Meltzer, May 2005 23 leadership qualities. Lehn (1984) analyzed the information asymmetries in baseball’s free agent market between the team currently employing a player and other teams interested in the player’s services, finding that teams have inside knowledge about their player’s characteristics. He focused mainly on the rate of disability among players that change teams, but one would expect the same type of result for players popular in the clubhouse. Given the virtual impossibility of quantifying these qualities, however, I have not included any measure of these intangibles in my regressions. Another important factor is the player’s age. However, age could potentially work in two directions. As a proxy for experience, one would expect that age would be a positive attribute and would lead to higher salary and possibly longer contracts. But as players reach their mid to late 30s, their physical

abilities decline, so we would expect age to be a negative attribute. In order to model this trend, I have included both age (AGE) and age squared (AGE2) in my data. Finally, I included some basic team characteristics that have traditionally been used in baseball research. I included a dummy variable for if the team’s payroll was in the top five of all Major League teams (HIPAY), and another indicator if payroll was in the bottom five (LOPAY). I also included the population in the team’s metropolitan area (POP) to measure the size of the market the team is catering to. It is also important to note any changes that occurred in the market for players between 1997 and 2002, the last year in my dataset. In 2002, the owners and the Major League Baseball Players Association reached a new collective bargaining agreement. This agreement caused a number of changes, most notably the expansion of the system of revenue sharing that had begun in 1997 and a luxury tax for teams with high

payrolls, both phased in starting in 2003 and growing thereafter. We might expect the luxury tax to depress the payrolls of the already high- Source: http://www.doksinet Josh Meltzer, May 2005 24 payroll teams. So far, however, the effects of the luxury tax seem somewhat ambiguous The New York Yankees, notoriously big spenders, have seen their payroll swell from $126 million to $208 million in the three years since the new collective bargaining agreement. Other teams have had dramatic falls in payroll, well below the luxury tax threshold. Only the Boston Red Sox seem to have a payroll hovering around the luxury tax threshold, as one might have predicted many teams would do. The Collective Bargaining Agreement would affect any deals signed in 2002. To control for this effect, I have included an interaction term between high payroll teams and contracts signed in 2002 (LUX). However, the coefficient on LUX is likely to be biased by the fact that contracts signed in 2002 are

disproportionately one-year deals, since any one-year deals signed in earlier years were since supplanted by deals signed in 2002. In other words, any contract signed before 2002 is a long-term deal, while most of the 2002 deals are one-year, not because of any inherent difference between the years but because of the way the data were chosen. Including an indicator for the year 2002 would clear up this problem but would add unnecessary noise to the regression. I have not included any year indicators for this reason, but as a result the coefficient on LUX should be interpreted with a good deal of caution. Summary Statistics A brief look at the summary statistics of the data, broken down by contract length in Table 3, reveals some interesting trends. The mean contract length is 1789 years Almost two thirds of the players in this sample have one year contracts, with the rest somewhat evenly distributed across two-, three-, four-, and five-year contracts. Average salary rises as the number

of years increases, with a dramatic increase from one-year to two-year contracts, largely because many of the players with one-year deals are receiving the league minimum contract of $300,000. The fact that players with three-year and four-year deals are receiving almost identical mean Source: http://www.doksinet Josh Meltzer, May 2005 Table 3: Summary Statistics by Length of Contract One-year Two-year (259 obs.) (25 obs.) 25 Three-year (38 obs.) Four-year (45 obs.) Five-year (16 obs.) Variables Mean (S.E) Mean (S.E) Mean (S.E) Mean (S.E) Mean (S.E) AVGSAL 0.96(123) 3.49(276) 5.47(260) 5.49(357) 8.23(461) OPSAVG 0.72(10) 0.78(08) 0.83(08) 0.80(09) 0.89(11) OPSCHANGE -0.01(12) 0.02(005) .02(05) 0.02(06) 0.03(07) PAAVG 273.75(17405) 464.66(15248) 473.69(13335) 530.05(12597) 545.22(13340) PAUP 0.20(40) 0.09(28) 0.39(50) 0.36(48) 0.38(50) ALLSTAR 0.08(30) 0.23(49) 0.47(76) 0.51(73) 0.81(83) GOLDGLOVE 0.02(15) 0.23(60) 0.05(23) 0.31(82)

0.25(58) DLFEW 2.00(383) 2.50(387) 3.13(473) 4.04(524) 1.13(340) DLMANY 9.03(1806) 7.85(1590) 9.18(2073) 5.35(1357) 7.85(1549) HEALTHY 0.76(43) 0.66(48) 0.71(46) 0.76(043) 1.00(00) AGE 28.86(427) 31.97(301) 29.89(304) 27.96(307) 29.19(390) AGE2 851.12(26078) 1031.00(19204) 902.68(18398) 790.76(17215) 866.19(22948) CATCHER 0.19(39) 0.17(38) 0.13(34) 0.09(29) 0.13(34) SHORTSTOP 0.07(26) 0.09(28) 0.08(27) 0.13(34) 0.06(25) OF 0.32(47) 0.29(46) 0.45(50) 0.38(49) 0.44(51) FREEAGENT 0.35(48) 0.80(41) 0.61(50) 0.44(50) 0.50(52) ARBITRATION 0.27(44) 0.20(041) 0.29(46) 0.47(50) 0.31(48) HIPAY 0.15(36) 0.17(38) 0.18(39) 0.11(32) 0.19(40) LUX 0.15(36) 0.09(28) 0.00(00) 0.02(15) 0.00(00) LOPAY 0.19(39) 0.09(28) 0.11(31) 0.09(29) 0.19(40) POP 6.17(537) 4.91(361) 5.38(357) 7.67(592) 5.41(445) Source: http://www.doksinet Josh Meltzer, May 2005 26 salaries may be due to the small sample size for these years (and

note that the standard deviation of salary is much higher for the four-year deals). This general trend confirms the hypothesis that average salary and length tend to increase together, and highlights the difficulties researchers have had in attempting to locate a compensating differential between length and salary in baseball contracts. The three primary measures of a player’s performance, OPSAVG, ALLSTAR, and PAAVG, all trend upwards as the length of contract increases, showing that elite players are more likely to get long-term deals. For a number of other measures, however, the relationship with length seems less apparent from the summary statistics. Age seems to have no clear relationship with length that is evident from simply looking at the means. Arbitration eligibility seems similarly ambiguous. Players with deals greater than one year seem to be more likely to be free agents, but among the long-term deals there is no clear relationship with free agency. None of the injury

measures show an obvious trend, nor do the position dummy variables or any of the team characteristics. We must turn to econometric analysis to uncover these relationships Empirical Model In order to investigate the relationship between salary and length of contract, and to isolate factors that may affect one variable but not the other, I employed a joint estimation of salary and length. Previous research has generally regressed either salary or length on a host of independent variables but omitted the other of the two. However, given that they are jointly determined during the contract negotiation process, it would seem important to include salary in the length regression and length in the salary regression. Krautmann and Oppenheimer (2002) demonstrated the importance of including length in the salary regression, and the reverse would seem equally important. However, doing a simple ordinary least squares regression and Source: http://www.doksinet Josh Meltzer, May 2005 27

including either length or average salary as an independent variable could lead to an endogeneity problem. In order to correct for this, I used a two-stage least squares to estimate salary as a function of years and other variables, and did the same process in reverse to estimate length of contract as a function of years and other variables. 8 The primary instrument for the average salary on length regression was DLMANY. Krautmann and Oppenheimer (2002) used days on the disabled list as the instrument in their analysis, and I expect DLMANY to be an even better instrument. The intuition behind this is that teams will care significantly about history of injury in offering long-term contracts, since injury-prone players are too risky to sign for an extended period. However, the teams will care less about injury history in the salary determination process as long as the player is putting up good numbers. For the length on average salary regression, there is no single instrument that is

very good on its own, but all of the performance metrics (OPSAVG, ALLSTAR, and PAAVG) as well as FREEAGENT are much more highly correlated with average salary than with length, so together they can serve to identify the secondstage contract length regression. The relationship between contract length and salary is an interesting one. As discussed earlier, we might expect to see a tradeoff between the two, since players may be willing to substitute a lower salary for a longer contract. Empirical work is unlikely to uncover this relationship, however, because even if this tradeoff exists, better players tend to receive both higher salaries and longer contracts. Because it is impossible to control for all of the performance measures, this relationship is likely to appear in the regressions. Krautmann and Oppenheimer (2002) used a cross-term of performance and length in their salary regression, 8 Given that length of contract is a discrete variable, one would ideally use an ordered probit

and then put the estimated values gained from the ordered probit into the salary regression. This proved somewhat cumbersome, however, and some basic comparisons suggested that the results from the two-stage least squares would not be significantly different. Source: http://www.doksinet Josh Meltzer, May 2005 28 finding that this cross-term had a statistically significantly negative coefficient, demonstrating the existence of a compensating effect. I have chosen instead to use the two-stage least squares method to get predicted values for length and salary but then to focus on the other factors that are affecting each of these variables differently instead of searching for the compensating effect. My empirical model involves two first-stage regressions, one of salary on the relevant independent variables and the other of length on the same variables. These equations appear as follows: (1) AVGSAL = β0 + β1OPSAVG + β2OPSCHANGE + β3PAAVG + β4PAUP + β5ALLSTAR + β6GOLDGLOVE +

β7DLFEW + β8DLMANY + β9HEALTHY + β10AGE + β11AGE2 + β12CATCHER + β13SHORTSTOP + β14OF + β15FREEAGENT + β16ARBITRATION + β17HIPAY + β18LUX + β19LOPAY + β20POP + εi (2) LENGTH = β0 + β1OPSAVG + β2OPSCHANGE + β3PAAVG + β4PAUP + β5ALLSTAR + β6GOLDGLOVE + β7DLFEW + β8DLMANY + β9HEALTHY + β10AGE + β11AGE2 + β12CATCHER + β13SHORTSTOP + β14OF + β15FREEAGENT + β16ARBITRATION + β17HIPAY + β18LUX + β19LOPAY + β20POP + εi Using the predicted values of both AVGSAL and LENGTH, I used a two-stage least squares to predict the other equation. The second-stage equations appear as follows: Source: http://www.doksinet Josh Meltzer, May 2005 29 (3) AVGSAL = β0 + β1LENGTH + β2OPSAVG + β3PAAVG + β4ALLSTAR + β5HEALTHY + β6AGE + β7AGE2 + β8FREEAGENT + β9LOPAY + β10LUX + εi (4) LENGTH = β0 + β1AVGSAL + β2DLMANY + β3PAUP + β4AGE + εi RESULTS AND DISCUSSION Tables 4 and 5 show the first-stage regression results for both average salary and length on

the independent variables. Although some of the variables differ, these first-stage regressions are similar to the form of previous research, especially for the salary regressions (Kahn 1993; Maxcy 2004). We would expect the results to be similar to those of other researchers I will describe these results briefly, mainly as a point of comparison with the second-stage results. By looking at the second-stage results, we can discover which factors affect length of contract or salary differently from the other. Any differences between the second-stage and first-stage results can also highlight the dangers of traditional regressions that omit either salary or length as an independent variable. Let us first look at the first-stage results for average salary (1), displayed in Table 4. As expected, the three primary performance metrics have positive coefficients and are significant at the .05 level (in fact all of these variables have t-statistics over 6, making them significant even at the

.001 level) Krautmann and Oppenheimer (2002), in their first-stage regression that excluded length, found slugging percentage and at-bats per year to be significant, with t-statistics of over 7 for both. Kahn (1993) found All-Star appearances to be significant, both for white and Source: http://www.doksinet Josh Meltzer, May 2005 Table 4: First-Stage Average Salary Estimates Variable Coefficient t-statistic OPSAVG 7.353 6.58* OPSCHANGE 1.225 1.32 PAAVG 0.005 7.53* PAUP -0.317 -1.36 ALLSTAR 1.851 8.44* GOLDGLOVE 0.529 2.08* DLFEW 0.065 2.45* DLMANY 0.021 3.27* HEALTHY 1.052 3.99* AGE 0.667 2.30* AGE2 -0.012 -2.54* CATCHER 0.662 2.26* SHORTSTOP 0.189 0.50 OF 0.407 1.85* FREEAGENT 0.668 1.81* ARBITRATION -0.012 -0.04 HIPAY 1.341 2.70* LUX -2.119 -3.67* LOPAY -0.219 -0.80 POP 0.027 1.20 Constant -16.231 R-squared = 0.658 * Significant at the .05 level *Significant at the .10 level -3.64* 30 Source: http://www.doksinet

Josh Meltzer, May 2005 31 black players. My results confirm the conclusions of other researchers that performance metrics are a significant predictor of player salary. Other player characteristics also showed the expected relationship. Whereas AGE is a statistically significant positive factor, AGE2 is statistically significantly negative, reflecting the fact that initially experience outweighs the negative effects of aging, but eventually this relationship reverses itself. In this case, age starts to negatively affect the player after age 28 Being a catcher raises average salary by more than $600,000, while being a shortstop has no effect. Being an outfielder also leads to an increased salary, although it is unclear whether teams really value outfielders more, or whether this coefficient is manifesting the positive correlation between outfielders and performance metrics. If the signing team was in the top five in payroll, this was likely to significantly increase the player’s

salary. Krautmann and Oppenheimer (2002) used a measure of team revenue, which is highly correlated with payroll, and got similar results. This payroll effect reflects the fact that big-market teams with deep pockets may be willing to pay more for good players. This is because these big-market teams have a higher marginal revenue product for the same player; in other words, the same player is worth more to a big market team than to a smaller-market team because of the greater potential of that player to increase the team’s revenue. However, this statistic should be interpreted with some skepticism, as there could be reverse causation (i.e, paying a player a high salary boosts the team payroll and may push it into the top five. This is unlikely for most players, but for the top players commanding $18 million annual salaries, we might see this effect). Additionally, the LUX variable is statistically significantly negative, although it is difficult to know whether the luxury tax or the

fact that the contract was signed in 2002 is primarily responsible for this event. Source: http://www.doksinet Josh Meltzer, May 2005 32 The injury variables seem a bit more difficult to understand. The HEALTHY indicator is positive and significant, as would be expected, given that teams are more likely to pay players high salaries if they think that they will be healthy the next season. However, both DLFEW and DLMANY show positive and significant coefficients, which seem counterintuitive. Given that both are measures that increase with incidence of injury, we would expect them to have negative coefficients. One possible explanation is that there may be some reverse causation Players being paid high salaries represent important investments by the team, and the team may be concerned with protecting that investment by not risking further injury. Furthermore, since teams are paid insurance if the player is on the DL, they may have a greater incentive to place a player with a minor

injury on the DL if that player is earning a high salary but not performing up to expectations. Regardless, this result confirms that DLMANY is likely to be an effective identifier for the second-stage salary regression. Let us now examine the results of the first-stage length estimates (2), displayed in Table 5. As with the salary regressions, OPSAVG, PAAVG, and ALLSTAR all have positive and significant coefficients. This confirms the assumption that better players tend to receive longer contracts as well as larger salaries. HIPAY is also significant, as bigger-market teams also seem to be giving long-term deals. Maxcy (2004) found that total bases (which acts as a joint measure of OPS and PAAVG) was significant, as well as a high revenue dummy. Unlike Maxcy, however, I did not find that LOPAY is significant. Although Maxcy’s logic about low payroll teams wanting to insure themselves against market uncertainty seems sensible, it appears to be counterbalanced by the fact that low

payroll teams pay lower average salaries which are in turn associated with shorter contracts. PAUP is also significant, indicating that an improvement in plate appearances in one’s final year is a factor in the length of contract, extending the length by Source: http://www.doksinet Josh Meltzer, May 2005 Table 5: First-Stage Length Estimates Variable Coefficient t-statistic OPSAVG 2.045 3.63* OPSCHANGE 0.626 1.34 PAAVG 0.002 5.79* PAUP 0.400 3.39* ALLSTAR 0.389 3.52* GOLDGLOVE 0.288 2.25* DLFEW 0.021 1.59 DLMANY 0.000 0.06 HEALTHY 0.176 1.33 AGE 0.064 0.44 AGE2 -0.002 -0.92 CATCHER 0.228 1.54 SHORTSTOP 0.022 0.13 OF 0.038 0.20 FREEAGENT 0.416 2.24* ARBITRATION 0.104 0.70 HIPAY 0.876 3.50* LUX -1.473 -5.07* LOPAY 0.068 0.50 POP 0.026 2.26* Constant -1.238 R-squared = 0.464 * Significant at the .05 level * Significant at the .10 level -0.55 33 Source: http://www.doksinet Josh Meltzer, May 2005 34 0.4 years Neither

DLAVG nor HEALTHY are significant, which is a somewhat surprising finding, given that one would expect a player’s health to be a very important factor for the team in offering a long-term contract. Also, the relationship with AGE and AGE2 does not hold up for the length regression, with neither coefficient significant. Although there are a few small differences, my first-stage length results also seem to be in line with prior research. Given the very strong relationship between average salary and length, however, I believe that these first-stage results suffer from omitted variable bias. Average salary and length are jointly determined, and each ought to be included as an independent variable in the other’s regression. In order to determine the relationship of other variables to length of contract beyond their relationship with average salary, and vice versa, we must look at the second-stage regression results. The results of the second-stage salary regression are displayed in

Table 6 As expected, LENGTH is a significant independent variable. Even after controlling for contract length, the three main hitting measures (ALLSTAR, OPSAVG, and PAAVG) are still significant at the .05 level Similarly, Krautmann and Oppenheimer (2002) found that slugging percentage and at-bats per year were significant even after including length in the regression. The fact that these measures are significantly positive even with the inclusion of length indicates that excellent performance on the field will lead to a salary premium over and above the length of the contract received. All three of these variables are three-year averages In signing large contracts, teams seem to value consistent performance over a three-year period. The independent variables OPSCHANGE and PAUP, which measure deviation from average performance in the final year of the contract, are not significant when included in the regression (they were left out of this final version). This result suggests that the

impact of a breakout year just before signing a contract may be overstated in the media, as teams seem to value consistently strong performance over a Source: http://www.doksinet Josh Meltzer, May 2005 35 Table 6: Second Stage Average Salary Estimates Variable Coefficient t-statistic LENGTH 1.105 3.79* OPSAVG 4.722 3.13* PAAVG 0.003 3.45* ALLSTAR 1.420 4.23* HEALTHY 0.328 1.61 AGE 0.733 3.56* AGE2 -0.012 -3.35* FREEAGENT 0.480 1.83* LOPAY -0.312 -1.51 LUX -0.200 -0.89 Constant -16.087 R-squared = 0.712 * Significant at the .05 level * Significant at the .10 level -4.90* great contract year. Both AGE and AGE2 are significant, with the expected positive coefficient on AGE and the negative coefficient on AGE2. The fact that this relationship holds even when including length in the estimate suggests that salary is especially dependent on a player being in his “prime,” which according to this regression occurs at age 31. Let us now consider the

results of the second-stage length regression on average salary (4), displayed in Table 7. This is a methodology that other scholars have not employed to this point, and it can help provide key insights in two areas. The first is determining the factors that affect length over and above their relationship to average salary. Perhaps more interestingly, the Source: http://www.doksinet Josh Meltzer, May 2005 36 Table 7: Second Stage Length Estimates Variable Coefficient t-statistic AVGSAL 0.327 13.67* DLMANY -0.005 -2.15* PAUP 0.463 4.25* AGE -0.038 -4.37* Constant 2.029 R-squared = 0.545 * Significant at the .05 level * Significant at the .10 level 7.77* second is to determine the importance of including salary in the length regression by comparing the results of this second-stage length regression to the first-stage one that omitted salary. For the length regression, we do not see the same parabolic relationship with age. For length, age is a consistently negative

coefficient across all ages. We have seen that contract length and salary are highly correlated, but age is one factor that appears to drive a wedge between average salary and length of contract. A player in his prime at age 31 and getting paid a large salary would not be expected to get as many guaranteed years as the relationship between salary and length of contract would suggest. This is an interesting finding that seems to confirm the intuition that teams would be reluctant to give long guaranteed contracts to older players. Players in their primes, usually in their early 30s, may be playing as well as they ever have, but their performance is likely to deteriorate as they age. Teams are willing to pay them large salaries for their short-term performance, but are unwilling to guarantee long contracts if they expect a significant decline in performance. Another variable that matters differently for contract length and average salary is PAUP, which is significant at the .05 level in

the second-stage length regression but insignificant in Source: http://www.doksinet Josh Meltzer, May 2005 37 either salary regression. Given that HEALTHY was not significant (as tested in an earlier regression), it appears that PAUP is picking up not a healthy season after an injury-riddled one, but rather an increase in playing time, probably due to improvement or at least increased confidence in the player’s ability. We would expect this type of improvement among younger players who are coming into the league. The coefficient of 463 means that increased playing time in the year before the signing of a contract adds almost half a year to the length of the contract. This is a substantial increase, suggesting that significant improvement in plate appearances is highly valued when thinking about a long-term contract. This is a very interesting finding for the contract length determination process, as it appears that teams are willing to take a bit of a gamble on young players

improving in the future based on improvement in the year before signing a contract. The combination of this result with the consistently negative age coefficient for the length regression seems to confirm the hypothesis that one major area of divergence between average salary and contract length would be found in young improving players eager to stay in the Majors and not in strong bargaining position. The most notable change from the first-stage to the second-stage length regression is that in the second-stage regression, DLMANY becomes significant at the .05 level As we might expect, teams do take injury history into account when considering how many years to sign a player for. This is an interesting finding that was not apparent from the first-stage results, where DLMANY was not significant. Although salary is a very important determinant of length of contract, history of injury factors into the team’s decision and will lead to shorter contracts in order to limit the risk of

losing the services of a player for a long period due to injury. This finding clearly demonstrates the importance of including salary as an independent variable in the length regression in order to uncover important relationships like the one with injury history. It Source: http://www.doksinet Josh Meltzer, May 2005 38 also highlights another area where average salary and contract length diverge, as injury history does not matter for size of salary. Ultimately, these regressions yield a few interesting findings. Average salary and contract length seem to diverge most notably in the cases of young improving players, who are likely to get long contracts at low salaries, and chronically injured players, who are likely to get short contracts at a salary in keeping with their performance. This very important second effect was not apparent from the first-stage regression when average salary was omitted as an independent variable. This result suggests that future research on average

salary and contract length should recognize that the two are jointly determined and include each as an independent variable in the other regression. While this paper has provided a positive analysis of how teams react toward players, it does not address the normative aspects of whether this behavior is rational. Further research could be done on various factors as predictors of future performance. Are the teams acting rationally in the way that they use different factors to determine contract length and average salary? An extension of this kind could provide useful recommendations to both teams and players as to which factors should matter in negotiating their contracts. Source: http://www.doksinet Josh Meltzer, May 2005 39 References Albert, Jim and Jay Bennett. 2001 Curve Ball: Baseball, Statistics, and the Role of Chance in the Game. New York: Copernicus Books Chelius, James R. and James B Dworkin 1982 “Free Agency and Salary Determination in Baseball.” Labor Law Journal,

August 33: pp539-545 Hill, James Richard and William Spellman. 1983 “Professional Baseball: The Reserve Clause and Salary Structure.” Industrial Relations, Winter 22(1): pp 1-19 Kahn, Lawrence M. 1991 “Discrimination in Professional Sports: A Survey of the Literature” Industrial and Labor Relations Review, April 44(3): p. 395-418 Kahn, Lawrence M. 1993 “Free Agency, Long-Term Contracts and Compensation in Major League Baseball: Estimates from Panel Data.” The Review of Economics and Statistics, February 75(1): pp. 157-164 Krautmann, Anthony C., Elizabeth Gustafson, and Lawrence Hadley 2001 “A Note on the Structural Stability of Salary Equations: Major League Baseball Pitchers.” Unpublished manuscript, DePaul University. Krautmann, Anthony C. and Margaret Oppenheimer, 2002 “Contract Length and the Return to Performance in Major League Baseball.” Journal of Sports Economics, February 3(1): pp. 6-17 Lehn, Kenneth. 1982 “Property Rights, Risk Sharing, and Player

Disability in Major League Baseball.” Journal of Law & Economics, October 25(2): pp 343-366 Lehn, Kenneth. 1984 “Information Asymmetries in Baseball’s Free Agent Market” Economic Inquiry, January 22(1): pp. 37-44 Lewis, Michael. 2005 “Absolutely, Power Corrupts” The New York Times Magazine (online edition), April 24. Maxcy, Joel. 2004 “Motivating Long-term Employment Contracts: Risk Management in Major League Baseball.” Managerial and Decision Economics, March 25(2): pp 109-120 Raimondo, Henry J. 1983 “Free Agents’ Impact on the Labor Market for Baseball Players” Journal of Labor Research, Spring 4(2): pp. 183-193 Schwarz, Alan. 2004 The Numbers Game: Baseball’s Lifelong Fascination with Statistics New York: Thomas Dunne Books. Source: http://www.doksinet Josh Meltzer, May 2005 Sommers, Paul M. 1990 “An Empirical Note on Salaries in Major League Baseball” Social Science Quarterly, December 71(4): pp. 861-867 40