Friday, June 16, 2017

MLS Database

As promised, I've finally got started on making this somewhat cleaned MLS database organized for others to use.  It still needs work and I'll be updating it over the summer so let me know if there's something specific you'd like to have or need explained in a readme file I haven't yet written.  Most of the older stats came from the kindness of Chris Edgemon, who scraped the original data from MLS's sites.  If you're looking for an interactive online database for stats and don't need seasons after 2014, you should just go to his site.

Chris told me of some known errors in the data set and that's what I've spent the last couple years combing through.  I also added more information such as whether or not a goal scored in minute 46 was the first minute of extra time in the first half or the first minute of the second half.  My database also has information on Generation Adidas Players, Designated Players, homegrown players, and salaries.  So if you're looking for something to import into Stata, R, etc., then this database is for you.

I don't have an official website yet to host the database so I'm linking it to an online drive.  Turns out, websites cost money.  Of course if the clunkiness of blogger.com and google drive offends you enough to throw money at me, I'll make it go away and replace it with a nice looking site to host the data.  If not, enjoy the data and please let me know if you find errors or have suggestions.

Click here for access to the folder that has the SQL database as well as the separate CSV for those of you that have as much trouble as I do getting SQL databases to "open" or "connect" or whatever it is they're supposed to do when other people who are not me use them.

Thursday, January 19, 2017

Holy break between posts Batman!  Do not take the absence of posts as a sign of giving up.  I've actually been working on the shootout idea formally and presented the paper at the North American Association of Sports Economists (NAASE) meeting (which is a subset of the Western Economic Association International meeting) in Portland a few weeks back. You can see the working paper, among others, here.

For this post, I want to walk through the last thought I had posted about the MLS shootout years.  I predicted that the shootout should have caused less games to end regulation in a tie and had decided to estimate the shootout era's effect on the likelihood of games ending regulation in a tie using a logit model.  The logit model estimates parameters when the dependent variable takes only two values.  For our purpose of estimating the likelihood of ties, this works well as our independent variable can be expressed as 0 for games that did not end in a tie, and 1 for those that did.

Mathematically, the probability that game i ends in a tie is modeled as

where P is the probability that the game ended regulation in a tie and x contains all game-level attributes including the era the game was played under (shootout, overtime, or current era), an indicator for interconference games, and the difference in the season goal differentials between the two teams to measure relative team strength. The Stata code to estimate the model is here and the data is here (I'm working on figuring out how to do this in R for those that don't have access to Stata).

Unlike Ordinary Least Squares (OLS) coefficients, we cannot take the "beta hat" on the shootout indicator variable as the effect of the shootout era on the probability that the game ended regulation in a tie.  Here, the effect of the shootout era is measured by the difference in probability when the shootout indicator is 1 and when it is 0, holding constant all other variables.  That is, we ask what is the probability of a game ending in a tie in the shootout?  Then ask, what if we played the same game without the shootout rules?  Whatever the difference in likelihood of tying is between the two scenarios is what we deem to be the shootout effect.  Mathematically, this is just


Now when we do this, we have to choose values for all the other control variables in x since they will be in the equation.  We could choose them to be anything but here it makes sense to set the difference in goal differentials between the two teams to be zero.  This causes the effect to be calculated for a game contested between two equally strong teams.  I also chose the interconference indicator to be zero but this has little effect since its coefficient was not significant. The estimated effects of the shootout and overtime eras are in the table below.



The second column shows how much each era changed the likelihood of any game between equally strong teams in the same conference ending regulation in a tie.  The shootout era is estimated to have decreased the likelihood of games ending in regulation in a tie by 5.13% while the overtime era is estimated to have decreased regulation ties by about 1.65%. However, only the marginal effect of the shootout is estimated with any significance.  That is, the small negative effect estimated for the overtime era may just be due to chance and the actual effect may be zero.  The single star on the shootout coefficient means that the p-value was less than .05 or that we are at least 95% confident that the true effect of the shootout on the likelihood of tying is not zero.

The purpose of the shootout and overtime rules were to make the game more exciting for American fans.  Although we didn't think much of the shootout as fans, golden goal overtime is undeniably an exciting way to end a match.  The video below of Eddie Gaven's golden goal (after a shady substitute as a keeper...more on that in a future post) is great evidence of the excitement.  Just listen to the announcer's voice crack when Tim Howard saves a potential game winning goal and again when Gaven scores.

To test the effect of the overtime golden goal rule on the likelihood of ties including overtime, I re-coded the tie variable to be 1 for the overtime era games only if that game ended overtime in a tie.  That is, if a game ended with a golden goal, I did not count it as a tie.  Of course I could not recode the shootout games in a similar manner since all shootouts ended with a winner.  After presenting this research at the NAASE conference, I  realized that I should have just left out the shootout games but for now I leave the results estimated as is.

The estimates in the first column suggest that when not counting games ending in golden goal overtime as tied, the overtime era actually decreased the likelihood of ties by almost 10%!  Since we did not pick up a significant effect in regulation, this means that all the decrease in ties were due to super exciting (or heartbreaking depending on your loyalties) golden goals scored in overtime.  Clearly, the policy worked as planned.  Whether or not the fans beyond myself preferred games to be ended this way remains to be tested.



Wednesday, August 24, 2016

Quick Thoughts on MLS Attendance

Today I saw a tweet from @TotalMLS showing the average attendance of MLS in 2016 thus far.  Along with other twittazens, I wondered what the rankings would look like in terms of population and stadium capacity.  I recreated the histogram using Excel below for comparison. I would like to note that every time I have to use Excel or Word I die a little on the inside so I hope someone finds this useful.

  
The usual comments and questions that come up after seeing attendance numbers are things like 

  • Seattle invented attendance!  Let our capo come to your town and show you how to do it.
  • Why is team X so terrible and why doesn't the league move team X to my city Y?
  • Of course your big city team has great attendance; you have a billion people.
  • My stadium doesn't even hold that many people.
  • The poor attendance is because the stadium is far from the city.

While I refuse to even acknowledge the first two points, the last three we can at least start to investigate.  Below is the average attendance numbers expressed as a percent of each team's stadium capacity.  I see this as a short-run test of the front office's ability since they are at least temporarily stuck with their capacity.
 

 Kudos to San Jose, Sporting Kansas City, and Montreal, who are defying reality and filling up their stadium beyond capacity.  Having never visited these stadiums, my guess is this means they sold all their seats and had "standing room only" type tickets as well.  Portland is hitting 100%, which may not be surprising since the stadium has, as far as I know, the best atmosphere in MLS.  It is a little surprising, however, since I was at the inaugural game this season and noticed empty seats.  Maybe some fans were walking around.  Maybe teams are back to reporting tickets distributed and not butts in seats.  Who knows?

Looking at the teams with the most issues filling up their stadiums, the bottom 6 all reside in football stadiums whose capacity is artificially restricted to prevent the fans from feeling like their in a mostly deserted fish bowl.  I attended Rapids games at Mile High stadium when there was less than 5,000 people in a 76,000 capacity stadium.  No matter how much you love soccer that's just not fun.  The percents here do not use the artificial capacity because that would be cheating.  These teams are 4 of the top 6 when it comes to average attendance and occupy the top 3 spots.  So not only do they (outside of DC) have the best attendance in MLS, they have much room to grow.

The teams not packing their stadiums and selling less than 80% probably have some work to do.  I'm going to throw Columbus and Colorado on the problem list as well because I don't believe Colorado's attendance numbers and Columbus somewhat cheated by replacing its north end seats with a stage thereby reducing its capacity.  So let's look at what percent of each team's metropolitan area population (2016 July population from the U.S. Census and 2015 population from Stats Canada).  This is a more long-run test of success since stadium size is a choice but your metropolitan area population is not.



RSL is killing it. The smallest market in MLS is getting the largest proportion of residents to cheer on its team. Is this due to the lack of sports entertainment competition?  RSL only compete with the Jazz in professional sports so that may be part of the story.  Cynical urbanophiles will say it's because there's nothing else to do in Salt Lake but these concrete jungle types have no idea just how many ways there are to be entertained in the beautiful Rocky Mountain Region.

The lack of professional sports competition is common among many of the other top teams (ORL, SEA, SJ, POR, VAN, CLB) while the abundance of other sports and entertainment options are clearly part of the story for the bottom dwellers located in NY/NJ, Chicago, and Los Angeles.  So who is doing poorly no matter which way we slice the data?  My unofficial list of troubled front offices would include

  1. Dallas
  2. Chicago
  3. DC
All three have lots of professional sports competition.  Dallas and DC have teams in the other 4 major sports and Chicago has 2 NBA and MLB teams, an NFL team and an NHL team.  Both Dallas and Chicago have stadiums that are far from their city centers and are sponsored by Toyota.  Maybe because a prius is the only affordable way to get to games?  DC is working on building their own stadium but do not have the excuse of being far from the city.  I'll give them a pass until the new stadium arrives and continue to shake my head at Dallas and Chicago's bad life choices.

Sunday, January 31, 2016

"Any given Sunday"

Back in September, I made the case that MLS was more of an any-given-Sunday league than perhaps the NFL.  I did so using methods from sports economics journals that are kind of statistics-heavy for the average reader.  Some have commented to me that my posts are a bit too technical but...that's the point!  However, I wanted to provide evidence that everyone can digest via an infographic.   And by the grace of graphic designer Todd Mason, I give you your evidence that MLS is indeed a league where anyone can beat anyone.


Wednesday, December 16, 2015

MLS Begins Era of Free(er) Agency

After taking a long firm stance against free agency, has MLS finally caved and given the players what they've been fighting for?  Sort of.  Although the players who qualify for free agency are indeed free to sign a contract with any team, the league has placed strict limits on the amount a player's salary may increase.  In other leagues such as Major League Baseball, players who reach free agency can see their salaries more than double for an annual increase in the millions.  This is exactly what MLS wants to avoid.

Expensive Pennies

According to the MLS Players' Union press release in July, the most a player's salary can increase is 25% and only if he was earning less than $100,000.  The limit on the increase drops to 20% if the player makes $100,000 or more and 15% if he makes $200,000 or more.  A note below this information reads, "The above percentage increases may be raised for players who significantly outperform their contracts".  But exactly how one goes about outperforming his contract is not explicitly spelled out.  To be eligible for free agency, a player must be out of contract, at least 28 years old, have at least 8 years of MLS experience, and be paid less than the league maximum (currently $436,250).

I'm hoping the wording in the union's press relesase is poorly chosen and that the percent caps on salary increase are actually on a sliding scale.  Otherwise, a player like Chad Barrett listed below makes a penny too much at exactly $100,000.  At that salary, if Chad were eligible for free agency, he could only negotiate a salary of $100,000*1.2  = $120,000.  However, if he made a penny less, he could negotiate a 25% increase in salary, $99,999.99*1.25 = $124,999.99. That last last penny could potentially be costing him $4,999.99!  The drop in max salary is even worse for someone like Bobby Boswell who made exactly $200,000.  He can negotiate for a salary up to $200,000*1.15 = $230,000 but if he made only a penny less, could negotiate up to $199,999.99*1.2 = $239,999.99 or $9,999.99 more! Again, since the collective bargaining agreement is not yet posted on the players' union site, we cannot be 100% certain that this is how the system works but according to the union's press release, these odd drops exist.

Interesting Incentives

Assuming the situation is exactly as laid out in the players' union press release, there are some interesting incentives introduced with this flavor of free agency.  There are seemingly two salary ranges where a player could potentially negotiate a higher salary in the future if he were actually paid less in the current season; the ranges $96,000 to $99,999 and $191,667 to $199,999  shown below.


Consider a soon-to-be free agent whose salary is just below either cutoff shown above.  His team has an incentive to increase his current salary past the cutoff point.  Doing so would give the team more bargaining power later as potential suitors would not be able to increase the player's salary as much.  It may also save the team considerable money in the future if they retain the free agent.

Free Agents, Potential Free Agents, and Mike Magee

According to mlssoccer.com, there are 27 players listed as potential free agents.  However, there are many more who satisfy the age, MLS experience, and salary restrictions.  Since contract information is not publicly available, I assume those that weren't listed as free agents are currently under contract.  Since soccer contracts are usually not many years long, it is likely that these players can be free agents relatively soon.  Below is the list of potential free agents I've compiled from MLS player data from mlssoccer.com (which removed all but 2 of the players on the free agent list from the stats page!) and the 2015 salaries according to the players' union along with some notes.  MLS announced free agents are in bold at the top.  Potential free agents (conditional on being out of contract) are below and not in bold.


teamplayerageseasonguaranteed_salarymax_potential_salary2016 FA
CHIJeff Larentowicz3211271,000311,650signed LA
CHIJon Busch31990,000112,500(NASL-IND)
CHITy Harden31875,07893,848FA
COLBobby Burling319140,000168,000FA
COLDrew Moor3111270,500311,075signed TOR
COLJames Riley331183,750104,688FA
COLMichael Harrington299130,000156,000signed CHI
COLNick LaBrocca319180,000216,000FA
FCDStephen Keel321160,00075,000FA
HOUNathan Sturgis281075,37594,219FA
HOURicardo Clark3211337,750388,413resigned HOU
LAAlan Gordon3411175,000210,000signed LA
LAEdson Buddle3414106,250127,500FA
MTLKenny Cooper319285,625328,469FA
NYCFCNed Grabavoy3212215,000247,250signed POR
NYRBKyle Reynish32890,317112,896FA
ORLCorey Ashe299189,750227,700signed CLB
ORLEric Avila28877,00096,250(MX -Santos)
PHIBrian Carroll3413150,000180,000signed PHI
PHIConor Casey349180,000216,000signed CLB
PORAndrew Weber321160,00075,000FA
SEAChad Barrett3011100,000120,000signed SJ
SEATroy Perkins3410136,663163,995retired
SJKhari Stephenson34871,64689,557FA
SKCPaulo Nagamura3211230,000264,500signed SKC
MTLJustin Mapp3115199,225239,070signed SKC
CHIPatrick Nyarko298215,750248,113
CLBTyson Wahl311199,667124,583
COLMarc Burch3111110,000132,000
COLSam Cronin298202,500232,875
DCBobby Boswell3211200,000230,000
DCChris Rolfe3210225,000258,750
DCDavy Arnaud3514212,500244,375
DCFabian Espindola309175,000210,000
DCSean Franklin308234,167269,292
FCDAtiba Harris3011130,000156,000
FCDChris Seitz289130,000156,000
HOUDavid Horst30881,500101,875
LADan Gargan3312125,000150,000
LADan Kennedy338233,000267,950
LARobbie Rogers288191,667230,000
MTLDominic Oduro3013251,667289,417
MTLEric Kronberg3210132,000158,400
NEBrad Knighton30882,005102,506
NEChris Tierney298113,333136,000
NYCFCChris Wingert3313215,000247,250
NYCFCJason Hernandez3211185,000222,000
NYCFCJosh Saunders341090,000112,500
NYCFCMehdi Ballouchy321383,250104,063
NYRBDax McCarty2811262,500301,875
PHISebastien Le Toux318285,228328,012
PORJack Jewsbury3413137,500165,000
PORNat Borchers3411245,000281,750
PORWill Johnson289334,333384,483
RSLJamison Olave348300,000345,000
RSLJavier Morales359300,000345,000
RSLNick Rimando3616370,000425,500
RSLTony Beltran288205,950236,843
SEABrad Evans309302,666348,066
SEAChad Marshall3112291,667335,417
SJMarvell Wynne2911200,625230,719
SJQuincy Amarikwa2810100,000120,000
SJShea Salinas298148,333178,000
SJSteven Lenhart298159,083190,900
SKCChance Myers288195,000234,000
SKCJacob Peterson2911130,375156,450
TORHerculez Gomez338261,000300,150
VANJordan Harvey3111150,000180,000
 
You may notice that I left off Mike Magee. Although he's included on mlssoccer.com's list, his guaranteed salary was above the maximum so according to the rules listed on the players' union website, he's not eligible.  EDIT: and yet he was just announced as a free agent signing with LA!  MLS isn't known for following it's own rules but there may be an explanation for this.  It may be that allocation money was used to "pay down" what Magee's salary was considered to be on paper to bring him below the limit.  I'll look into this hypothesis.

It'll be very interesting to see where the first free agents go.  The literature on free agency predicts they should be more likely to go to teams in bigger cities where their financial impact is larger, teams on the verge of playoffs or Champions' League where their productivity could bump the team into that pool of revenue, and/or more amenable locations.



Saturday, October 17, 2015

MLS Shootout (part 2) - One Table to Rule Them All

The MLS shootout was introduced in 1996 to prevent American sports fans from suffering through a tied soccer game.  The assumption was that we would prefer an NHL hockey-style shootout to a game without a winner and that we wouldn't mind becoming the laughing stock of the rest of the soccer world.

Assuming a competitively balanced league where any team has an equal probability of winning, losing, or tying any game, I showed in the last post that teams have less incentive to fight for a win in a late tied game now that there is no shootout.  That is, teams are more likely to settle for a tie now that it guarantees a point.  However, when looking at the percent of tied games each season, it became clear that the jump in the percent of tied games didn't occur immediately after the shootout era, which ended after the 1999 season (see the reproduced graph below).  If we take a closer look at the data, we can see that the percent of games ending in a tie has more to do with conferences and the rules of playoff entry than shootouts.



The reason for this is simple: since the top teams in each conference gain entry to the post-season, teams are less averse to sharing points when playing non-conference opponents.  For example, when Columbus Crew SC tied the Vancouver Whitecaps this past April, the Crew didn't mind sharing points with a Western team. The point given away to Vancouver had no bearing on the Crew's playoff chances since Vancouver competes for spots in the Western Conference.  However, the 3-3 draw Columbus had with Toronto was less beneficial as it gave away a point to Toronto, which remained 3 points behind Columbus in the Eastern standings.  So on net, Columbus gained no ground on Toronto whereas the tie against Vancouver gained Columbus a point against all Eastern teams.

It stands to reason then that seasons with more inter-conference games should have less ties as teams fight harder to net 3 points (3 to them - 0 to opponent = 3 net points) instead of 0 (1 to them - 1 to opponent = 0 net points) against conference opponents.  This has already been shown to be the case in NHL by Shmanske and Lowenthal (2007).  But what about MLS?

Taking a look at the percent of tied games plotted against the percent of games played between opponents of the same conference for all non-shootout seasons, it seems that this is indeed the case.  There is a clear downward trend indicating that the greater the percent of inter-conference games, the less the percent of tied games there will be.


Use the data and R script to reproduce this graph and all others below.

What is up with those outlier years with 100% inter-conference games you ask?  Although MLS actually had 3 "conferences" (East, Central, and West) for the 2000-2001 seasons, entry into the playoffs was granted to the top two-thirds of all teams regardless of conference.   This was also true in 2002 although the league reverted back to 2 conferences (East and West).  So in practice the conferences only served as lists of teams that were geographically close and played each other more often.  This is why I'm treating those years as if 100% of the games played were inter-conference; for the sake of playoffs, they were.  

You have to appreciate the irony here.  In the year where MLS seemed to rebel once more against European football and create 4 team-conferences a la the NFL, what it really had was a single table with an unbalanced schedule.  Single-table purists should add this fact to their arguments; a single table should decrease the number of tied games in MLS.

Back to the shootout era.  Adding these years to the plot, we see that outside of that crazy 1999 outlier season, the pattern still holds.  Additionally, we see the effect the shootout had on tied games independent of conference effects.  Again, outside of 1999, the shootout seasons saw a lower percentage of ties compared to other seasons with the same percent of inter-conference games.


So what can MLS do to minimize the amount of tied games we see? There are two changes it could implement: a single table for playoffs and/or reintroduce the shootout.  Although the latter would explicitly eliminate ties as well as reduce the number of games that ended regulation in a tie, I don't actually want to return to the shootout days. Neither does MLS and neither do you.  But a single table?  When asked about the possibility of a single table back in 2010, Don Garber replied

...every year we do deeply analyze whether or not it makes sense for us to have a single table and no playoffs. We also evaluate whether it makes sense to have a single table and playoffs, or whether it makes sense to have conferences and playoffs.

...We're going to change it if we believe we could have a more compelling format and one that might be perhaps more balanced. ...It should be a format where there is more media coverage, more television ratings and more attendance as we get down to an event that is a single, stand-alone event, our championship game, the MLS Cup.

...All of those can happen within a single table format. The single table discussion is whether there is a single table or whether there are conferences?
I have no idea what goes on when the top brass of MLS meet to deeply analyze using a single table.  What I do know now is that the evidence above strongly suggests that doing so would reduce the number of ties.  It would also appease the Eurosnobs out there who use words like "cappo" and refer to games as "matches" and fields as "pitches" and....ok I'm getting nauseous, enough.

And if a single table is not in the works, we could  theoretically baby-step our way to one by returning to the days when last playoff positions were given to the best teams regardless of conference (2007-2011).  For example, in 2008 the top 3 teams in each conference were granted access to the playoffs along with the remaining top 2 teams regardless of conference.  Although this yielded a strange situation where a team from New Jersey, the "New York" Red Bulls (NYRB), entered the playoff bracket in the West and was crowned the Western Conference Champions, it should have provided incentive for NYRB to give more effort against Western Conference teams during the regular season since they were competing for that last playoff spot with Western Conference teams.

2008 MLS Standings

Eastern ConferenceWestern Conference
team
points
team
points
Columbus Crew57Houston Dynamo51
Chicago Fire46Chivas USA43
New England Revolution43Real Salt Lake40
Sporting Kansas City42Colorado Rapids38
New York Red Bulls39FC Dallas36
D.C. United37San Jose Earthquakes33
Toronto FC35LA Galaxy33
                                                               *teams in bold admitted to playoffs

Both Sporting Kansas City and NYRB  had to earn more points than the 4 lowest ranked teams in the Western Conference to make the playoffs in 2008.  Thus, their intra-conference games should have been given the same importance as their inter-conference games.  The same should be true for all teams ranked around the last playoff spots and we should see more teams push for wins in closely fought games.  Although this incentive should exist in theory, it's not blatantly obvious at first glance.



The years for which some teams are ranked on a single table to gain entry into the playoffs are represented by blue circles.  The areas of the circles represent the proportion of teams that will make it to the playoffs via a single table comparison.  The absolute number has been as few as 2 teams and as many as 10 teams.  If the single table comparison caused more teams to push for wins in late tied games, we would see less ties when the circles are bigger and the circle years should be below the dots representing no single table comparisons.  Unfortunately, the years that used a single table for some teams are not close to many comparison years in the plot for us to draw a conclusion.

Moving forward, I think things are complex enough that it's time to carry out a logistic regression on game level data so that we can simultaneously control for all the moving parts influencing tied games in MLS.  These include but are not limited to, overtime structure, conference structure, playoff structure, and of course competitive balance within the league.  Maybe then we can find out just what was going on in that 1999 season.

This project started out in my mind as a simple investigation of the incentive effects of the MLS shootout, a rare gem in soccer history.  Upon diving into the data, it has become so much more.  That is how research usually goes.  And for all of my projects, you can replace the word "usually" with "always".  But this is why you should choose topics you love.  Then you don't mind so much when it takes 10 times as long as you thought it would.

Friday, September 25, 2015

Competitive Balance in MLS

Last time I talked about how I obtained the MLS game data I'll be using in my research.  The Wayback Machine is incredibly useful for finding things that use to be online.  It's also a great reminder that whatever you put on the internet is likely permanent in some way.  In this entry, I'm going to test how competitive MLS is by comparing the distribution of team outcomes to an "ideal" measure.

The motivation for this experiment comes from a similar analysis in Pay Dirt and Fort and Quirk (1995), both of which derive ideal distributions of win percentages for teams in equally balanced leagues. If we want to take their methods and apply them to soccer leagues, we must derive our own distribution of outcomes since the authors only consider leagues that have no ties.  Thus, instead of an ideal distribution for winning percentages, I'll derive an ideal distribution for percentage of possible points earned.

With 3 points awarded for a win, 1 for a tie, and 0 for a loss, the most a team could earn in a season with 30 games is 90.  Therefore, if a team only earned 45 points in a season, it would would have earned 45/90 X 100%=50% of points possible.  Let's see what an equally balanced or "ideal" league would look like below.

Consider an MLS season where each team has a certain number of games played (gp).  If each team is of equal playing strength, then the probability of winning, losing, or tying any game is PW=PL=PT=1/3.  That is, the outcome of each game follows a trinomial distribution.   The expected number of points earned from gp games for any team is then

If you're having trouble viewing the equations below, try not using Firefox.  Some security setting is preventing them from showing up.


and the variance in points is

where
 

and
 
 so that



Converting to percentage of points possible, the expected value becomes


and the variance becomes


Since the distribution of percentage points earned is an average taken from a trinomial distribution, it will be distributed normally with mean 4/9 and standard deviation


If you'd like to run a simulation of such an ideal league with any number of teams and games played, I've written an R code for you to do so.

One issue with comparing actual distributions of percentage of points earned to the ideal distribution is that the latter depends on the number of games in a season, which differs across season in MLS.  For this reason, I've combined outcomes for all seasons with the same number of games.  Also, since the first four years of MLS did not have ties but ended in a shootout, the formula above does not apply to those years.  Therefore, for 1996-2000, I use "hypothetical points", which are the points teams would have earned had there been no shootout.  The unit of observation is then the percentage of points earned for a particular team in a season with the same number of games as all other observations.  So how competitive is MLS exactly?


In the figure above, I use a kernal desnity estimate of the actual percentages earned (blue line) and compared this to the ideal distribution (black line).  For the years where 32 games were played, it seems that the actual distribution of percentage of points earned is a little too heavy in the tails to be considered ideal.  There are too many teams at the low end that earned around 20% or the possible points and too many at the high end that earned 60% or more.  A Kolmogorov-Smirnov test rejects that the actual distribution of percentages came from the ideal distribution (D = 0.265, p-value = 2.699e-05), although admittedly the test is not perfect since the percentages are not empirically from a continuous distribution but rather can only take steps of 1/96.

For the years with 30 games played, the actual distribution lines up quite nice with the ideal! Interestingly, this includes the designated player era, which started in 2007.  Many fans bemoaned the arrival of David Beckham as the arrival of an imbalanced league where only the big cities would be able to compete by paying millions of dollars for global stars.  However, there doesn't seem to be much evidence of this being the case in the figure below. The Kolmogorov-Smirnov test here cannot reject the claim that the observed percentages did in fact come from the ideal distribution (D = 0.09, p-value = 0.513).  Comparing this figure to the others in this post as well as those of the other sports leagues (NFL, NBA, MLB, NHL) in Pay Dirt and Fort and Quirk (1995) suggests that MLS is the most competitive professional sports league in all of US/Canada history and the only one that I am aware of to conform, at least temporarily, to a balanced ideal!
And then the bad news for people who like an any-given-Sunday league.  In the most recent seasons where 34 games were played, the distribution of percentages are no longer ideal (D = 0.20505, p-value = 0.003648).  In fact, this may be the least competitive group of seasons.  While there were minor tweaks to salary and designated player rules, nothing immediate stands out in this time period to suggest why the league has become so much less competitive.  However, this has not had any noticeable effect on league attendance or revenues as some may fear it would.

 Overall, we see that MLS is a relatively competitive league with at least one time frame of ideal levels of competition, at least as far as percentage of points earned goes.  However, this level of competitiveness seems to be diminishing as time goes on as the most recent time period deviates the most from what we may consider an ideal league to be.

For anyone interested in how the above was carried out, I've uploaded the data set and R script to recreate the Figures and Kolmogorov-Smirnov tests.  You could easily extend the code to match your favorite soccer league once you've downloaded the appropriate data set. 

Moving forward, I'm going to begin my investigation on the effects of the MLS shootout on team strategies.  In the meantime, if you have any MLS related question you think a bunch of data could answer, let me know I'll see what I can come up with.  Until then friends.