Are the Yankees simply buying championships?

Baseball fans frequently debate whether high-spending Major League Baseball teams can “buy” championships. The New York Yankees inspire hatred, envy, and even satire on this front. Unlike most other professional sports in the US, baseball has no salary cap: its weak “luxury tax” nudges teams to control spending rather than forcing them below a hard cap, leaving clubs free to break the bank for a roster of top talent — if they have the top-line revenue to support that spending. But can a high-spending team trump a more efficient but poorer team by going all out on salary to win games? I’ll explore that in R using data envelopment analysis (PDF link).1

Measuring team efficiency in baseball is very difficult — and the media pundits who attempt to measure it typically use unsophisticated methodologies. Google “baseball team efficiency” and you’ll find a number of articles that mostly divide salary by wins to get a cost per win figure, or otherwise use simple correlations of money out the door to wins and championships. Data envelopment analysis is a tool that allows the user to optimize a production function over multiple inputs and outputs, perfect for assessing multiple measures of team performance vs. salary. I’ll quote from the Journal of Sports Economics paper that inspired this post to explain:

DEA was introduced to measure the relative efficiency of decision-making units (DMUs) that change inputs into outputs (Charnes, Cooper, & Rhodes, 1978). The DEA model is a linear programming technique that compares the levels of inputs and outputs of one DMU with the rest of its peer group. The DMUs that produce the highest outputs with their inputs are deemed efficient, and these efficient DMUs form a piecewise linear frontier. The frontier surface is a hyperplane with as many dimensions as there are inputs and outputs. All inefficient DMUs are evaluated relative to the efficient surface.

That paper, “Is Winning Everything?: A Data Envelopment Analysis of Major League Baseball and the National Football League,” (PDF link) was published in 2004, with data leading right up to the introduction of baseball’s luxury tax. The author, Prof. Karl W. Einolf of Mount St. Mary’s University, proposes pitcher salaries (as a measure of investment in defense) and all other salaries (as a measure of offensive investment) as inputs, and measures how efficiently teams convert those inputs into team batting average, team ERA,2 and wins as outputs. I replaced AVG with OPS in my model and am using slightly different data (from the Baseball Databank) but am otherwise following the paper’s methodology.

In this analysis, perfectly efficient teams will have an efficiency factor of 1; perfect inefficiency is denoted by a factor of 0. I found Major League Baseball to have a mean efficiency of 0.76 during the period 1985-2013 with a standard deviation of 0.23, less efficient and more variable than Prof. Einolf’s results from 1985-2002. Interestingly, I found the efficiency factor of World Series-winning teams to be 0.86 — higher than the MLB average — but the difference is not statistically significant given the high standard deviation of both league efficiency factors and World Series winners’ efficiency factors (0.23 and 0.22 respectively).

Taking a look at the distribution of league efficiency compared to the distribution of world champions’ efficiency factors is a little more illuminating:



As you can see, the distribution of all teams skews to the right — even losing teams use their inputs relatively efficiently in most years. However, the distribution of world champions is even more efficient, with no champions falling below 0.2 and a much higher proportion of World Series winners attaining perfect efficiency than the proportion of all teams (57% vs. 35%).

Returning to the New York Yankees, in three of the five years they won the World Series in the sample data, they were perfectly efficient, while in two of those years they were significantly below average:

> teams[teams$WSWin == "Y" & teams$franchID == "NYY",c("franchID", "yearID", "eff")]
         franchID yearID       eff
1996.496      NYY   1996 0.6067819
1998.498      NYY   1998 1.0000000
1999.499      NYY   1999 1.0000000
2000.500      NYY   2000 0.2975017
2009.509      NYY   2009 1.0000000

This suggests it is possible to throw a lot of money at payroll and still win championships despite being fairly inefficient. In fact, the 2000 New York Yankees’ 0.30 efficiency factor is the worst of any World Series winner 1985-2013. The next most inefficient World Series winner was the 2006 St. Louis Cardinals, also acknowledged as a “come from nowhere” success — their regular-season winning percentage of .516 is the worst ever for a World Series champion.

However, as the distributions above suggest, these are outliers — luck and uncharacteristically strong performance in the post-season are both established features of championship runs in Major League Baseball, but efficiency in converting payroll into runners on base, strong defensive performance, and regular-season wins pays off in the post-season more often than not.

However, the truth may be that efficiency is necessary in many years but not sufficient. Other analyses suggest that teams can buy wins, at least up to a point, meaning poorer, smaller-market teams may struggle to compete no matter how efficient they are. It’s unlikely that the competitive balance tax (“luxury tax”) can help level the playing field — I’ll explore competitive balance in my next post.

  1. My code — admittedly not pretty — is available here on Github.
  2. Actually the paper uses the distance between a team’s ERA and the maximum team ERA that year as the output. This is because DEA assumes monotonicity, so ERA needs to be converted to a statistic where “bigger is better.”

Why This Blog and About Me

I’m an MBA currently working as an executive in the social sector (specifically K-12 education). Before I got my MBA and stopped being able to do simple tasks, I was a liberal arts guy (Slavic languages/literatures and pure mathematics, with a little computational linguistics mixed in), and then I worked in the hospitality industry. I’m interested in applying statistical modeling and other techniques to interesting problems and data sets.

I started this blog to catalog my progress from a moderately sophisticated amateur in the world of data science to…well…a slightly more sophisticated amateur. One of my MBA concentrations is in decision sciences–basically a mix of probability and statistics, decision models, game theory, and a little operations management. I haven’t applied these skills much directly in my professional life, but thanks to the Johns Hopkins Data Science Specialization on Coursera, I’m flexing my muscles a little and challenging myself to improve my skills.

I started this blog to share some of the things (some of them unique, most not) that I discover while fumbling my way through the field. I expect most of my posts here to fall outside my day-to-day world of K-12 education, though if I get my hands on some interesting public data I might explore that here as well. Currently I’m interested in baseball statistics and exploring health care and service operations data; anything related to sports or public policy is likely to catch my attention.

If you have a suggestion for how I can improve on something I’ve posted about, or you just want to say hi, please send me an email.