Strike one. Strike two. Strike three. You’re out!
When you hear these words, images of the great American pastime of baseball come to mind. But warm summer days, hotdogs, and the seventh inning stretch at your favorite Major League Baseball (MLB) ballpark are sometimes marred by contract disputes and tense salary negotiations. This has led to numerous player strikes, most notably in ‘94 when a strike led to an early end to the season and the cancellation of the World Series.
MLB player salaries have soared over the years and salary negotiations are often quite contentious. With guaranteed player contracts, MLB teams assume most of the risk if player performance doesn’t meet expectations.
The “Moneyball” approach, in which teams use analytics to assemble a competitive roster, was popularized by the 2003 book “Moneyball” and subsequent 2011 movie of the same name. Now imagine if you took a similar analytical approach to determine what a player should be paid.
That is what Jamie Wheeler, an information systems and operations management student who graduated in May 2016, did in his senior year with a research grant he secured through Mason’s Office of Student Scholarship, Creative Activities, and Research (OSCAR). The idea had nothing to do with a fascination with baseball, but rather his fascination with numbers.
With support from Dr. Pallab Sanyal, an associate professor of information systems and operations management in the School of Business, Wheeler analyzed data on 632 MLB players from the 1998-2012 seasons singling out 146 attributes, including offensive performance statistics, contract information, age, team, and year. Using this set of data, he created a model that highly correlates performance metrics to baseball player salary.
By accounting for past performance and age to predict future performance, a team can determine the worth of a player. This can be used to negotiate player contracts (guaranteed salary and performance incentives) that reduce the risk to the team if a player underperforms.
Wheeler says, “It is possible to determine how performance correlates to pay. Players and teams alike will benefit from this by being able to justify salaries as well as match funds to the proper player.”
Some of the key variables for performance statistics included career offensive win percentage, home runs, hits, doubles, triples, total bases and strikeout percentage. To put this into perspective, Wheeler used his model to analyze historical players and found that those receiving the highest rating (who should command the highest salaries) included Barry Bonds, Babe Ruth, Ty Cobb, and Honus Wagner, to name a few.
Wheeler says, “By comparing the high correlation variables to historically great baseball players not part of the data set, I believe I get an additional indicator that the model is very good.”
There are some limitations to the model. It cannot be used for every position. The designated hitter and catcher positions each have unique attributes that cannot be accounted for in the model, such as a catcher’s runner intimidation and runner attempts avoided.
Additionally, the model uses performance to account for 61% of player salaries. The portion of the salary that is uncorrelated is most likely due to non-performance metrics such as ticket sales, advertisement revenue, merchandising, popularity, ability to get along with teammates. Unfortunately, Wheeler says that these metrics, as well as defensive data, are more difficult to measure.
Despite its limitations, Wheeler says there’s massive real-world applications here.
“Teams can see players who are over-performing for their salary and poach them. They can look to see which players are likely to be traded. A player may be underperforming for his salary but on a new team, based on certain conditions, he can perform better and the new team can grab him for an appropriate price.
Wheeler says, “The language of the future in business is statistics and George Mason is doing a good job. People suddenly realize they need data scientists and there aren’t any out there. The more interest we can generate here in these data sciences, the happier I am.”