Utilizing machine learning techniques in football prediction
MetadataShow full item record
From debates between football analysts, to opposing team’s fans, predicting the winner between two football teams has always been intrinsically tied to the sport. In more recent times, predicting the winner between two teams, and by extension why this team won, has taken on added importance. From football teams looking for ways to gain a competitive advantage, to attain championships and increased revenue, to fans looking to back their team and see a return in the form of betting, accurately predicting the winner between two teams takes on increased importance. With the expanded availability of machine learning techniques, it is now possible to build multiple models that can learn and interpret a data set to provide a prediction as to who will win between any two football teams. The models in this paper will be provided with base football statistics, those statistics that are gathered after every match such as the number of shots a team has or how many fouls were committed, and additional psychological and non psychological factors. This is due to a football match being determined by more than just base statistics, with a team’s mentality and ability to deal with external factors a key part of the modern game. This paper aims to not only provide an accurate prediction for which team would win in any given match, but also provide some answer as to why they won. Showing what variables and features most determine why one team is selected to win over another, providing some explanation and logic behind the predictions. Utilizing seminal works in the field, such as Razali et al. (2017) and Gangal et al. (2015), to be more informed on which models have been used previously and how they performed, this paper seeks to build upon all that came before. CRISP-DM was the methodology used to keep the research structured and focused, while a positivist research approach was utilized to ensure that only unbiased quantitative data was used, data that is entirely built upon facts and figures. This quantitative data set was compiled and curated by the author utilizing two seasons of Premier League football data and supplied to the eight models that were selected to allow them to learn. This learning was then applied to one final data set to assess the predictive power it has gained in learning. The author was successfully able to predict the results of 72.37% matches across the three-hundred and eighty game Premier League season, comparable to the literature. The model that performed best returned 85% accuracy when trained on nothing but base statistics, and 75% accuracy when trained on the additional factors that were included. This resulted in a predicted final league table that closely resembles the real final league table, with most discrepancies between the two able to be understood and explained.
The following license files are associated with this item: