Wednesday, December 16, 2015

MLS Begins Era of Free(er) Agency

After taking a long firm stance against free agency, has MLS finally caved and given the players what they've been fighting for?  Sort of.  Although the players who qualify for free agency are indeed free to sign a contract with any team, the league has placed strict limits on the amount a player's salary may increase.  In other leagues such as Major League Baseball, players who reach free agency can see their salaries more than double for an annual increase in the millions.  This is exactly what MLS wants to avoid.

Expensive Pennies

According to the MLS Players' Union press release in July, the most a player's salary can increase is 25% and only if he was earning less than $100,000.  The limit on the increase drops to 20% if the player makes $100,000 or more and 15% if he makes $200,000 or more.  A note below this information reads, "The above percentage increases may be raised for players who significantly outperform their contracts".  But exactly how one goes about outperforming his contract is not explicitly spelled out.  To be eligible for free agency, a player must be out of contract, at least 28 years old, have at least 8 years of MLS experience, and be paid less than the league maximum (currently $436,250).

I'm hoping the wording in the union's press relesase is poorly chosen and that the percent caps on salary increase are actually on a sliding scale.  Otherwise, a player like Chad Barrett listed below makes a penny too much at exactly $100,000.  At that salary, if Chad were eligible for free agency, he could only negotiate a salary of $100,000*1.2  = $120,000.  However, if he made a penny less, he could negotiate a 25% increase in salary, $99,999.99*1.25 = $124,999.99. That last last penny could potentially be costing him $4,999.99!  The drop in max salary is even worse for someone like Bobby Boswell who made exactly $200,000.  He can negotiate for a salary up to $200,000*1.15 = $230,000 but if he made only a penny less, could negotiate up to $199,999.99*1.2 = $239,999.99 or $9,999.99 more! Again, since the collective bargaining agreement is not yet posted on the players' union site, we cannot be 100% certain that this is how the system works but according to the union's press release, these odd drops exist.

Interesting Incentives

Assuming the situation is exactly as laid out in the players' union press release, there are some interesting incentives introduced with this flavor of free agency.  There are seemingly two salary ranges where a player could potentially negotiate a higher salary in the future if he were actually paid less in the current season; the ranges $96,000 to $99,999 and $191,667 to $199,999  shown below.


Consider a soon-to-be free agent whose salary is just below either cutoff shown above.  His team has an incentive to increase his current salary past the cutoff point.  Doing so would give the team more bargaining power later as potential suitors would not be able to increase the player's salary as much.  It may also save the team considerable money in the future if they retain the free agent.

Free Agents, Potential Free Agents, and Mike Magee

According to mlssoccer.com, there are 27 players listed as potential free agents.  However, there are many more who satisfy the age, MLS experience, and salary restrictions.  Since contract information is not publicly available, I assume those that weren't listed as free agents are currently under contract.  Since soccer contracts are usually not many years long, it is likely that these players can be free agents relatively soon.  Below is the list of potential free agents I've compiled from MLS player data from mlssoccer.com (which removed all but 2 of the players on the free agent list from the stats page!) and the 2015 salaries according to the players' union along with some notes.  MLS announced free agents are in bold at the top.  Potential free agents (conditional on being out of contract) are below and not in bold.


teamplayerageseasonguaranteed_salarymax_potential_salary2016 FA
CHIJeff Larentowicz3211271,000311,650signed LA
CHIJon Busch31990,000112,500(NASL-IND)
CHITy Harden31875,07893,848FA
COLBobby Burling319140,000168,000FA
COLDrew Moor3111270,500311,075signed TOR
COLJames Riley331183,750104,688FA
COLMichael Harrington299130,000156,000signed CHI
COLNick LaBrocca319180,000216,000FA
FCDStephen Keel321160,00075,000FA
HOUNathan Sturgis281075,37594,219FA
HOURicardo Clark3211337,750388,413resigned HOU
LAAlan Gordon3411175,000210,000signed LA
LAEdson Buddle3414106,250127,500FA
MTLKenny Cooper319285,625328,469FA
NYCFCNed Grabavoy3212215,000247,250signed POR
NYRBKyle Reynish32890,317112,896FA
ORLCorey Ashe299189,750227,700signed CLB
ORLEric Avila28877,00096,250(MX -Santos)
PHIBrian Carroll3413150,000180,000signed PHI
PHIConor Casey349180,000216,000signed CLB
PORAndrew Weber321160,00075,000FA
SEAChad Barrett3011100,000120,000signed SJ
SEATroy Perkins3410136,663163,995retired
SJKhari Stephenson34871,64689,557FA
SKCPaulo Nagamura3211230,000264,500signed SKC
MTLJustin Mapp3115199,225239,070signed SKC
CHIPatrick Nyarko298215,750248,113
CLBTyson Wahl311199,667124,583
COLMarc Burch3111110,000132,000
COLSam Cronin298202,500232,875
DCBobby Boswell3211200,000230,000
DCChris Rolfe3210225,000258,750
DCDavy Arnaud3514212,500244,375
DCFabian Espindola309175,000210,000
DCSean Franklin308234,167269,292
FCDAtiba Harris3011130,000156,000
FCDChris Seitz289130,000156,000
HOUDavid Horst30881,500101,875
LADan Gargan3312125,000150,000
LADan Kennedy338233,000267,950
LARobbie Rogers288191,667230,000
MTLDominic Oduro3013251,667289,417
MTLEric Kronberg3210132,000158,400
NEBrad Knighton30882,005102,506
NEChris Tierney298113,333136,000
NYCFCChris Wingert3313215,000247,250
NYCFCJason Hernandez3211185,000222,000
NYCFCJosh Saunders341090,000112,500
NYCFCMehdi Ballouchy321383,250104,063
NYRBDax McCarty2811262,500301,875
PHISebastien Le Toux318285,228328,012
PORJack Jewsbury3413137,500165,000
PORNat Borchers3411245,000281,750
PORWill Johnson289334,333384,483
RSLJamison Olave348300,000345,000
RSLJavier Morales359300,000345,000
RSLNick Rimando3616370,000425,500
RSLTony Beltran288205,950236,843
SEABrad Evans309302,666348,066
SEAChad Marshall3112291,667335,417
SJMarvell Wynne2911200,625230,719
SJQuincy Amarikwa2810100,000120,000
SJShea Salinas298148,333178,000
SJSteven Lenhart298159,083190,900
SKCChance Myers288195,000234,000
SKCJacob Peterson2911130,375156,450
TORHerculez Gomez338261,000300,150
VANJordan Harvey3111150,000180,000
 
You may notice that I left off Mike Magee. Although he's included on mlssoccer.com's list, his guaranteed salary was above the maximum so according to the rules listed on the players' union website, he's not eligible.  EDIT: and yet he was just announced as a free agent signing with LA!  MLS isn't known for following it's own rules but there may be an explanation for this.  It may be that allocation money was used to "pay down" what Magee's salary was considered to be on paper to bring him below the limit.  I'll look into this hypothesis.

It'll be very interesting to see where the first free agents go.  The literature on free agency predicts they should be more likely to go to teams in bigger cities where their financial impact is larger, teams on the verge of playoffs or Champions' League where their productivity could bump the team into that pool of revenue, and/or more amenable locations.



Saturday, October 17, 2015

MLS Shootout (part 2) - One Table to Rule Them All

The MLS shootout was introduced in 1996 to prevent American sports fans from suffering through a tied soccer game.  The assumption was that we would prefer an NHL hockey-style shootout to a game without a winner and that we wouldn't mind becoming the laughing stock of the rest of the soccer world.

Assuming a competitively balanced league where any team has an equal probability of winning, losing, or tying any game, I showed in the last post that teams have less incentive to fight for a win in a late tied game now that there is no shootout.  That is, teams are more likely to settle for a tie now that it guarantees a point.  However, when looking at the percent of tied games each season, it became clear that the jump in the percent of tied games didn't occur immediately after the shootout era, which ended after the 1999 season (see the reproduced graph below).  If we take a closer look at the data, we can see that the percent of games ending in a tie has more to do with conferences and the rules of playoff entry than shootouts.


The reason for this is simple: since the top teams in each conference gain entry to the post-season, teams are less averse to sharing points when playing non-conference opponents.  For example, when Columbus Crew SC tied the Vancouver Whitecaps this past April, the Crew didn't mind sharing points with a Western team. The point given away to Vancouver had no bearing on the Crew's playoff chances since Vancouver competes for spots in the Western Conference.  However, the 3-3 draw Columbus had with Toronto was less beneficial as it gave away a point to Toronto, which remained 3 points behind Columbus in the Eastern standings.  So on net, Columbus gained no ground on Toronto whereas the tie against Vancouver gained Columbus a point against all Eastern teams.

It stands to reason then that seasons with more inter-conference games should have less ties as teams fight harder to net 3 points (3 to them - 0 to opponent = 3 net points) instead of 0 (1 to them - 1 to opponent = 0 net points) against conference opponents.  This has already been shown to be the case in NHL by Shmanske and Lowenthal (2007).  But what about MLS?

Taking a look at the percent of tied games plotted against the percent of games played between opponents of the same conference for all non-shootout seasons, it seems that this is indeed the case.  There is a clear downward trend indicating that the greater the percent of inter-conference games, the less the percent of tied games there will be.



Use the data and R script to reproduce this graph and all others below.

What is up with those outlier years with 100% of games being between same-conference teams you ask?  Although MLS actually had 3 "conferences" (East, Central, and West) for the 2000-2001 seasons, entry into the playoffs was granted to the top two-thirds of all teams regardless of conference.   This was also true in 2002 although the league reverted back to 2 conferences (East and West).  So in practice the conferences only served as lists of teams that were geographically close and played each other more often.  This is why I'm treating those years as if 100% of the games played were inter-conference; for the sake of playoffs, they were.  

You have to appreciate the irony here.  In the year where MLS seemed to rebel once more against European football and create 4 team-conferences a la the NFL, what it really had was a single table with an unbalanced schedule.  Single-table purists should add this fact to their arguments; a single table should decrease the number of tied games in MLS.

Back to the shootout era.  Adding these years to the plot, we see that outside of that crazy 1999 outlier season, the pattern still holds.  Additionally, we see the effect the shootout had on tied games independent of conference effects.  Again, outside of 1999, the shootout seasons saw a lower percentage of ties compared to other seasons with the same percent of inter-conference games.


So what can MLS do to minimize the amount of tied games we see? There are two changes it could implement: a single table for playoffs and/or reintroduce the shootout.  Although the latter would explicitly eliminate ties as well as reduce the number of games that ended regulation in a tie, I don't actually want to return to the shootout days. Neither does MLS and neither do you.  But a single table?  When asked about the possibility of a single table back in 2010, Don Garber replied

...every year we do deeply analyze whether or not it makes sense for us to have a single table and no playoffs. We also evaluate whether it makes sense to have a single table and playoffs, or whether it makes sense to have conferences and playoffs.

...We're going to change it if we believe we could have a more compelling format and one that might be perhaps more balanced. ...It should be a format where there is more media coverage, more television ratings and more attendance as we get down to an event that is a single, stand-alone event, our championship game, the MLS Cup.

...All of those can happen within a single table format. The single table discussion is whether there is a single table or whether there are conferences?
I have no idea what goes on when the top brass of MLS meet to deeply analyze using a single table.  What I do know now is that the evidence above strongly suggests that doing so would reduce the number of ties.  It would also appease the Eurosnobs out there who use words like "cappo" and refer to games as "matches" and fields as "pitches" and....ok I'm getting nauseous, enough.

And if a single table is not in the works, we could  theoretically baby-step our way to one by returning to the days when last playoff positions were given to the best teams regardless of conference (2007-2011).  For example, in 2008 the top 3 teams in each conference were granted access to the playoffs along with the remaining top 2 teams regardless of conference.  Although this yielded a strange situation where a team from New Jersey, the "New York" Red Bulls (NYRB), entered the playoff bracket in the West and was crowned the Western Conference Champions, it should have provided incentive for NYRB to give more effort against Western Conference teams during the regular season since they were competing for that last playoff spot with Western Conference teams.

2008 MLS Standings

Eastern ConferenceWestern Conference
team
points
team
points
Columbus Crew57Houston Dynamo51
Chicago Fire46Chivas USA43
New England Revolution43Real Salt Lake40
Sporting Kansas City42Colorado Rapids38
New York Red Bulls39FC Dallas36
D.C. United37San Jose Earthquakes33
Toronto FC35LA Galaxy33
                                                               *teams in bold admitted to playoffs

Both Sporting Kansas City and NYRB  had to earn more points than the 4 lowest ranked teams in the Western Conference to make the playoffs in 2008.  Thus, their intra-conference games should have been given the same importance as their inter-conference games.  The same should be true for all teams ranked around the last playoff spots and we should see more teams push for wins in closely fought games.  Although this incentive should exist in theory, it's not blatantly obvious at first glance.

The years for which some teams are ranked on a single table to gain entry into the playoffs are represented by blue circles.  The areas of the circles represent the proportion of teams that will make it to the playoffs via a single table comparison.  The absolute number has been as few as 2 teams and as many as 10 teams.  If the single table comparison caused more teams to push for wins in late tied games, we would see less ties when the circles are bigger and the circle years should be below the dots representing no single table comparisons.  Unfortunately, the years that used a single table for some teams are not close to many comparison years in the plot for us to draw a conclusion.

Moving forward, I think things are complex enough that it's time to carry out a logistic regression on game level data so that we can simultaneously control for all the moving parts influencing tied games in MLS.  These include but are not limited to, overtime structure, conference structure, playoff structure, and of course competitive balance within the league.  Maybe then we can find out just what was going on in that 1999 season.

This project started out in my mind as a simple investigation of the incentive effects of the MLS shootout, a rare gem in soccer history.  Upon diving into the data, it has become so much more.  That is how research usually goes.  And for all of my projects, you can replace the word "usually" with "always".  But this is why you should choose topics you love.  Then you don't mind so much when it takes 10 times as long as you thought it would.








Monday, October 5, 2015

The MLS Shootout (Part I) - Fast Kicking, Low Scoring, and Ties? You Bet!

American sports fans aren't known for their love of tied games.  Other than MLS, the NFL is the only league to currently allow games to end in a tie and we usually go an entire season without even one tied game.  Fearing that Americans couldn't support a soccer league with games that frequently ended in a draw, MLS initially decided to end all tied games with a shootout in its inaugural season.  Not the penalty shootouts you're use to seeing in the World Cup or UEFA mind you, but more like a NHL-style shootout where players charge at the keeper and try to get a shot off before the 5-second clock runs out. Because that would be more American.  For those of you who didn't have the pleasure to see one of these in person, by the magic of Youtube, you still can.




 Now this is clearly a sub-optimal way to decide a winner. MLS did at least try to dissuade games from ending this way by awarding 3 points for a win in regulation and only 1 point for a shootout victory.  So were Americans more entertained by a NHL-style shootout? Did less games end in a tie?  Should we have expected them to?

The answer to the first question is of course subjective but my guess is that the answer is a resounding "no".  Otherwise, we'd still have the shootout.  To answer if less game ended in a tie and whether or not we should expect them to, we can turn to the data and probability theory.  Let's first see what we should expect to happen with a shootout.

Consider, as we did in the last post, a league where all teams are of equal strength and the likelihood of any team winning, losing, or tying a game is equal to 1/3.  Now with shootouts, when a game in regulation is tied each team will have a 50% chance to win the shootout.  So we have 4 possible outcomes for any given team, they can win, lose, win in a shootout, or lose in a shootout.  The probabilities for the outcomes are PW=PL=1/3 and PSOW=PSOL=1/6.  So we can model the outcomes of the games as draws from a multinomial distribution with four outcomes.  Of course these probabilities are somewhat arbitrary.  We could just as easily assume any numbers where PW=PL and PSOW=PSOL and still have an ideal competitive league. The probabilities chosen are simply for ease of illustration.  

To be specific, the probability density function (pdf) of points is



Before any game, a team can expect



with a variance of


so that

 
If a game is tied in regulation, then each team has a 50% chance of winning or losing and the conditional expectation becomes


with a variance of


 so that 



The expected points earned in a shootout is much less than in regulation but does have a lower variance.  So although the expected payoff in a shootout is lower, it is more certain.  However, given the large difference in expected outcomes, a team would have to be severely risk averse to prefer to end a game in a shootout. Or, abstracting from both teams being of equal strength, one team would have to believe it had a very low chance of winning in regulation to prefer the shootout.  There may be evidence that this is what the San Jose Clash were thinking in 1999 with their record 13 shootouts (more on that later).

So did this setup give teams incentive to try to win in regulation and avoid shootouts altogether?  Let's look at what the distributions for expected points are for games are now without the shootout.   We can repeat the process above for the situation where teams can win, lose, or tie with equal probability (1/3).  Or we could just plug in a value of 1 game played (gp=1) in the calculation we did in the last post.  Either way, we'll get



and of course if a game ends in a tie, each team gets a guaranteed 1 point with 0 variance. Now let's look at this all in a convenient table.

Regulation (SO) Shootout Difference (SO) Regulation (Ties) Tie Difference (Ties)
Expected Value 1.167 0.50 -0.667 1.333 1.000 -0.333
Variance 1.806 0.250 -1.556 1.556 0 -1.556

As we can see from the first 3 columns of data, the expected points earned when going from regulation to a shootout decreases by two-thirds (-0.667) and the outcome is much more certain.  In the last 3 columns, we see that the decrease in expected points after regulation is half that of the shootout era (-0.333 vs. -0.666)!  Furthermore, it's a guaranteed outcome of 1 point each.  So having a tied game in regulation loses less points in expectation without the shootout.  Saying it another way, teams have less incentive to avoid tied games when there is no shootout.  So it seems the answer to the question of whether we should expect less ties in the shootout era is, yes.  The payoff structure did suit the game well for avoiding games that were tied in regulation.  But did it actually work?

You can recreate the graphs using this data and this R script.

While the graph above indicates that games were less likely in the initial years to end in a tie after 90 minutes, the switch doesn't seem to perfectly coincide with the end of the shootout era, which lasted from 1996 to 1999.  So although the answer to our question of whether there was actually a higher chance of ending a game in regulation during the shootout era is "yes", we now have new questions that need to be answered.  What is going on in 1999 that there were so many games going to shootout? And more importantly, why does the jump in games tied in regulation occur in 2003 and not immediately after the shootout era?

For a hint, below is the same graph with lines drawn around the period of time for which MLS essentially had a single table.  I say "essentially" because there were actually multiple conferences during this time but entry into the playoffs was granted to the top 8 teams regardless of conference. This led to a somewhat odd situation where the entire Western Conference made the playoffs in 2002.


We'll dig into why divisions matter when it comes to whether or not games end in a tie next time.  Until then, I'll leave you with pop culture's initial reaction to MLS a la The Simpsons.


Friday, September 25, 2015

Competitive Balance in MLS

Last time I talked about how I obtained the MLS game data I'll be using in my research.  The Wayback Machine is incredibly useful for finding things that use to be online.  It's also a great reminder that whatever you put on the internet is likely permanent in some way.  In this entry, I'm going to test how competitive MLS is by comparing the distribution of team outcomes to an "ideal" measure.

The motivation for this experiment comes from a similar analysis in Pay Dirt and Fort and Quirk (1995), both of which derive ideal distributions of win percentages for teams in equally balanced leagues. If we want to take their methods and apply them to soccer leagues, we must derive our own distribution of outcomes since the authors only consider leagues that have no ties.  Thus, instead of an ideal distribution for winning percentages, I'll derive an ideal distribution for percentage of possible points earned.

With 3 points awarded for a win, 1 for a tie, and 0 for a loss, the most a team could earn in a season with 30 games is 90.  Therefore, if a team only earned 45 points in a season, it would would have earned 45/90 X 100%=50% of points possible.  Let's see what an equally balanced or "ideal" league would look like below.

Consider an MLS season where each team has a certain number of games played (gp).  If each team is of equal playing strength, then the probability of winning, losing, or tying any game is PW=PL=PT=1/3.  That is, the outcome of each game follows a trinomial distribution.   The expected number of points earned from gp games for any team is then

If you're having trouble viewing the equations below, try not using Firefox.  Some security setting is preventing them from showing up.


and the variance in points is

where
 

and
 
 so that



Converting to percentage of points possible, the expected value becomes


and the variance becomes


Since the distribution of percentage points earned is an average taken from a trinomial distribution, it will be distributed normally with mean 4/9 and standard deviation


If you'd like to run a simulation of such an ideal league with any number of teams and games played, I've written an R code for you to do so.

One issue with comparing actual distributions of percentage of points earned to the ideal distribution is that the latter depends on the number of games in a season, which differs across season in MLS.  For this reason, I've combined outcomes for all seasons with the same number of games.  Also, since the first four years of MLS did not have ties but ended in a shootout, the formula above does not apply to those years.  Therefore, for 1996-2000, I use "hypothetical points", which are the points teams would have earned had there been no shootout.  The unit of observation is then the percentage of points earned for a particular team in a season with the same number of games as all other observations.  So how competitive is MLS exactly?


In the figure above, I use a kernal desnity estimate of the actual percentages earned (blue line) and compared this to the ideal distribution (black line).  For the years where 32 games were played, it seems that the actual distribution of percentage of points earned is a little too heavy in the tails to be considered ideal.  There are too many teams at the low end that earned around 20% or the possible points and too many at the high end that earned 60% or more.  A Kolmogorov-Smirnov test rejects that the actual distribution of percentages came from the ideal distribution (D = 0.265, p-value = 2.699e-05), although admittedly the test is not perfect since the percentages are not empirically from a continuous distribution but rather can only take steps of 1/96.

For the years with 30 games played, the actual distribution lines up quite nice with the ideal! Interestingly, this includes the designated player era, which started in 2007.  Many fans bemoaned the arrival of David Beckham as the arrival of an imbalanced league where only the big cities would be able to compete by paying millions of dollars for global stars.  However, there doesn't seem to be much evidence of this being the case in the figure below. The Kolmogorov-Smirnov test here cannot reject the claim that the observed percentages did in fact come from the ideal distribution (D = 0.09, p-value = 0.513).  Comparing this figure to the others in this post as well as those of the other sports leagues (NFL, NBA, MLB, NHL) in Pay Dirt and Fort and Quirk (1995) suggests that MLS is the most competitive professional sports league in all of US/Canada history and the only one that I am aware of to conform, at least temporarily, to a balanced ideal!
And then the bad news for people who like an any-given-Sunday league.  In the most recent seasons where 34 games were played, the distribution of percentages are no longer ideal (D = 0.20505, p-value = 0.003648).  In fact, this may be the least competitive group of seasons.  While there were minor tweaks to salary and designated player rules, nothing immediate stands out in this time period to suggest why the league has become so much less competitive.  However, this has not had any noticeable effect on league attendance or revenues as some may fear it would.

 Overall, we see that MLS is a relatively competitive league with at least one time frame of ideal levels of competition, at least as far as percentage of points earned goes.  However, this level of competitiveness seems to be diminishing as time goes on as the most recent time period deviates the most from what we may consider an ideal league to be.

For anyone interested in how the above was carried out, I've uploaded the data set and R script to recreate the Figures and Kolmogorov-Smirnov tests.  You could easily extend the code to match your favorite soccer league once you've downloaded the appropriate data set. 

Moving forward, I'm going to begin my investigation on the effects of the MLS shootout on team strategies.  In the meantime, if you have any MLS related question you think a bunch of data could answer, let me know I'll see what I can come up with.  Until then friends.

Saturday, September 19, 2015

Greetings! Here are 2 tools to help you free soccer data.

As I work on a "revise and resubmit" for an academic journal, I come across many new (at least to me) sports economics models and ideas. My first thought is always to apply these new concepts to my preferred drug of choice, Major League Soccer (MLS).  Rather than wait a year or so until many small ideas coalesce into a paper that will be presented to a room of perhaps 20 sports economists, I think this time I'll share the little steps along the way with anyone who will listen. I needed a place to put the thoughts and ideas I've been having about soccer.  So here we are.  My soccer scratch pad online.  An electronic version of what I've been doing on my Friday's most of the summer. 

The path to understanding and carrying out soccer analytics is not straightforward; it's sort of the wild west at the moment.  One surprising problem I came across was the difficulty in obtaining simple MLS data.  Want to see the goalscorers' names and minutes for the 4-3 goal-fest of Columbus vs. New England in August of 2012?  When you try to get the box scores from mlssoccer.com, you'll get an "access denied" page.  Is this a technical problem? OPTA restricting access to data so they can charge us for the information?  Your guess is as good as mine as my emails go unanswered.

Luckily, we have at least two weapons in our arsenal to free the data.  First, the Wayback Machine brought to you by the beautiful people at Internet Archive.  It's an amazing tool that let's you view webpages as they were back in time. So we can see what the boxscore page for the Columbus vs. New England game looked like in 2012.  The data is freed!

Second, we have a great database of soccer knowledge at SoccerStats.us.  There are a handful of issues with the data.  Some goals are listed twice, many goals scored in extra time are listed as being scored in the 45th or 90th minute when they were actually scored in the 46th or 93rd minute.  I'm guessing this has to do with the way the data was scrapped from the original sites (many goals may be listed as 45+ or 90+).  However, with the first tool I was able to figure out what the real minutes were but it was a grueling process.

Last night I used the data to get measures of how competitive MLS is.  It is a common opinion that the league is more competitive than the European leagues and perhaps more competitive than other American sports leagues.  But there's no need for conjecture!  This hypothesis can be tested using the simple methods described in the fantastic book written by Rodney Fort and James Quirk, Pay Dirt and,  if you prefer a more academic treatment, in the Journal of Economic Literature article by the same authors. 

The graphs are made and the results are in but my Saturday morning has become Saturday afternoon and this apartment isn't going to clean itself.  So next time I'll show you what I came up with and include data and R code so that you can reproduce what I've done.  Right now the analytics community, which I do not consider myself a part of nor would they recognize me, are very stingy with their data and procedures.  It is insanely frustrating to read articles using new data and techniques  to analyze matches only to be hit with a pay wall or "access denied" messages when trying to recreate the analysis.  As such, I will provide my data and code for any who wish to learn.  See you then.