Analysis, speculation and prediction seem to becoming an ever-increasing part of the Eurovision build up, and I love it!
As a result, there seems to be a lot of models floating about these days, so I think it's about time to show what the BetEurovision model is predicting and give an overview of how it works. If you want to skip to the spoilers, you can find our model prediction at the end.
I won’t be going into the maths of the model here (maybe another time), but I think it will be of interest to provide an explanation behind the rational of our model build. The model is created with a 3-stage process:
Building a points projection based on historic data
Layering on contextual information
Runing simulations
Building a Points Projection
Building up a solid baseline of expected points that is backed up by historic data ensures that any bias from inferring the impact of data metrics is removed (in instances where we have the data to measure the impact) . Additionally, having a baseline based on empirical data and statistics is invaluable in ensuring that our future estimations remain grounded in reality.
To come up with the baseline projection I have compiled a dataset of wide variety of metrics for every song since the last major voting change, in 2016. The dataset includes many metrics such as:
Streaming stats
Poll Results
Running Order
Country voting patterns/bias
Performer demographics
The song’s musical attributes
After analysing how the metrics correlate to jury and televote points from previous years, I built up a baseline model to which I then apply the song data from this year’s contest.
Note that because this projection is entirely data-led it could have been built without listening to any of the songs or hearing any opinions on them. Considering this, I think it has done a pretty good job, but we can do better…
Layering on Contextual Information
The previous phase was purely led by data, but this is where the subjectivity comes in.
If you have been following the season closely you will see that there are several areas where the projection has come up with estimates that seem off. This could be some golden insight that the correlations have found, but it is more likely that these discrepancies can be explained by factors that could not be captured by the baseline model. Where this is the case, we should make the appropriate adjustments.
What discrepancies jump out to you from the projection? These are the most prominent things that jump out to me:
The Netherlands and Finland televote will likely be higher than projected as they are gimmicky acts that can expect to pick up extra points from casual viewers.
Ukraine and Israel can also expect to take a higher share of the televote due to diaspora and political sympathy.
Ireland can be expected to pick up more points from both sides of the leaderboard due to its impressive staging improvements.
Belgium will likely perform worse on both sides of the leaderboard due to underwhelming live performances.
These are just a few of the examples of where I feel that the baseline differs from my expectations. For optimal adjustments I try to crowdsource expectations by aggregating opinions, news, and trends that I see online. This allows for a multitude of minor adjustments which can properly shape the baseline projection into accurate estimations.
Not only does this adjustment process create good estimates of the expected points tally for each country. It also allows me to estimate upper and lower ranges for each country’s points tallies, which is important for the final stage of the process…
Running Simulations
The final step is to use the output of the previous stages to run thousands of simulations of the contest's scoring that can be aggregated to give us an accurate indication of the probability of each country winning (or indeed any particular outcome occurring).
The main output of our betting model is to predict the probability of outcomes rather than to predict an exact scenario. However, for the interest of the reader I have provided a view of how the model thinks the contest will play out on average next Saturday.
I re-run this process multiple times during the Eurovision season to have the most up to date prediction and to test out scenarios. For example, it can be used to answer useful questions, such as “if Israel’s expected televote points were to increase by 50, how much would that affect their chance of winning?” (not as much as you might think actually)
I will aim to publish an updated run of the model before the final next week. Anyway, here is how the model currently thinks the contest will play out on average next Saturday.
Comments