Friday, June 16, 2017

MLS Database

As promised, I've finally got started on making this somewhat cleaned MLS database organized for others to use.  It still needs work and I'll be updating it over the summer so let me know if there's something specific you'd like to have or need explained in a readme file I haven't yet written.  Most of the older stats came from the kindness of Chris Edgemon, who scraped the original data from MLS's sites.  If you're looking for an interactive online database for stats and don't need seasons after 2014, you should just go to his site.

Chris told me of some known errors in the data set and that's what I've spent the last couple years combing through.  I also added more information such as whether or not a goal scored in minute 46 was the first minute of extra time in the first half or the first minute of the second half.  My database also has information on Generation Adidas Players, Designated Players, homegrown players, and salaries.  So if you're looking for something to import into Stata, R, etc., then this database is for you.

I don't have an official website yet to host the database so I'm linking it to an online drive.  Turns out, websites cost money.  Of course if the clunkiness of blogger.com and google drive offends you enough to throw money at me, I'll make it go away and replace it with a nice looking site to host the data.  If not, enjoy the data and please let me know if you find errors or have suggestions.

Click here for access to the folder that has the SQL database as well as the separate CSV for those of you that have as much trouble as I do getting SQL databases to "open" or "connect" or whatever it is they're supposed to do when other people who are not me use them.

Thursday, January 19, 2017

Holy break between posts Batman!  Do not take the absence of posts as a sign of giving up.  I've actually been working on the shootout idea formally and presented the paper at the North American Association of Sports Economists (NAASE) meeting (which is a subset of the Western Economic Association International meeting) in Portland a few weeks back. You can see the working paper, among others, here.

For this post, I want to walk through the last thought I had posted about the MLS shootout years.  I predicted that the shootout should have caused less games to end regulation in a tie and had decided to estimate the shootout era's effect on the likelihood of games ending regulation in a tie using a logit model.  The logit model estimates parameters when the dependent variable takes only two values.  For our purpose of estimating the likelihood of ties, this works well as our independent variable can be expressed as 0 for games that did not end in a tie, and 1 for those that did.

Mathematically, the probability that game i ends in a tie is modeled as

where P is the probability that the game ended regulation in a tie and x contains all game-level attributes including the era the game was played under (shootout, overtime, or current era), an indicator for interconference games, and the difference in the season goal differentials between the two teams to measure relative team strength. The Stata code to estimate the model is here and the data is here (I'm working on figuring out how to do this in R for those that don't have access to Stata).

Unlike Ordinary Least Squares (OLS) coefficients, we cannot take the "beta hat" on the shootout indicator variable as the effect of the shootout era on the probability that the game ended regulation in a tie.  Here, the effect of the shootout era is measured by the difference in probability when the shootout indicator is 1 and when it is 0, holding constant all other variables.  That is, we ask what is the probability of a game ending in a tie in the shootout?  Then ask, what if we played the same game without the shootout rules?  Whatever the difference in likelihood of tying is between the two scenarios is what we deem to be the shootout effect.  Mathematically, this is just


Now when we do this, we have to choose values for all the other control variables in x since they will be in the equation.  We could choose them to be anything but here it makes sense to set the difference in goal differentials between the two teams to be zero.  This causes the effect to be calculated for a game contested between two equally strong teams.  I also chose the interconference indicator to be zero but this has little effect since its coefficient was not significant. The estimated effects of the shootout and overtime eras are in the table below.



The second column shows how much each era changed the likelihood of any game between equally strong teams in the same conference ending regulation in a tie.  The shootout era is estimated to have decreased the likelihood of games ending in regulation in a tie by 5.13% while the overtime era is estimated to have decreased regulation ties by about 1.65%. However, only the marginal effect of the shootout is estimated with any significance.  That is, the small negative effect estimated for the overtime era may just be due to chance and the actual effect may be zero.  The single star on the shootout coefficient means that the p-value was less than .05 or that we are at least 95% confident that the true effect of the shootout on the likelihood of tying is not zero.

The purpose of the shootout and overtime rules were to make the game more exciting for American fans.  Although we didn't think much of the shootout as fans, golden goal overtime is undeniably an exciting way to end a match.  The video below of Eddie Gaven's golden goal (after a shady substitute as a keeper...more on that in a future post) is great evidence of the excitement.  Just listen to the announcer's voice crack when Tim Howard saves a potential game winning goal and again when Gaven scores.

To test the effect of the overtime golden goal rule on the likelihood of ties including overtime, I re-coded the tie variable to be 1 for the overtime era games only if that game ended overtime in a tie.  That is, if a game ended with a golden goal, I did not count it as a tie.  Of course I could not recode the shootout games in a similar manner since all shootouts ended with a winner.  After presenting this research at the NAASE conference, I  realized that I should have just left out the shootout games but for now I leave the results estimated as is.

The estimates in the first column suggest that when not counting games ending in golden goal overtime as tied, the overtime era actually decreased the likelihood of ties by almost 10%!  Since we did not pick up a significant effect in regulation, this means that all the decrease in ties were due to super exciting (or heartbreaking depending on your loyalties) golden goals scored in overtime.  Clearly, the policy worked as planned.  Whether or not the fans beyond myself preferred games to be ended this way remains to be tested.