As an avid football fan and hobbyist data analyst, I‘ve been fascinated by advanced metrics like expected goals (xG) and how they‘ve revolutionized the way we look at the beautiful game. In this post, we‘ll take a deep dive into xG, including the counterintuitive question – can it actually go negative?
A quick primer on expected goals
Let‘s start with a quick refresher on what xG is and why it‘s useful. xG models estimate the likelihood of a shot resulting in a goal based on thousands of historical examples with similar characteristics. The value ranges from 0 to 1, with 1 representing a 100% chance of scoring.
So for example, Opta gives a penalty kick around a 0.79 xG on average. This means based on the data, 79% of penalties are scored historically. On the other hand, long range efforts tend to have very low xG values, sometimes as low as 0.01 or 0.02.
By quantifying chance quality, xG provides much more context than just looking at raw shot totals or even shots on target. It has quickly become indispensable for analysts and punters assessing a team‘s performance.
The evolution of expected goals models
The idea of an expected goals model was first introduced in hockey, but really gained popularity in football starting around 2012. Companies like Opta, StatsBomb, and StatsPerform collected large datasets of historical shots to create xG models using techniques like logistic regression.
As the models have grown more advanced, they now incorporate things like defensive pressure, keeper position, angle of attack, and much more. The accuracy has improved to around 85-95% inPREDcurrently when looking at large samples.
Of course, football analytics is still an arms race, and debates continue on the best approach. Some analysts argue machine learning models like neural networks could provide even more accuracy as the datasets grow. But the fundamentals of xG remain widely agreed upon.
Explaining the math behind expected goals
For those interested in the mathematical nitty gritty, xG models use logistic regression to create sigmoidal curves like this one:
An example xG probability curve (credit: StatsBomb)
Based on historical data, a unique probability curve is created for each shot location. The xG value is simply the area under the sigmoidal curve up to the shot‘s distance. Data scientists continue to find new ways to increase model accuracy.
But at its core, xG relies on wisdom of the crowds – using thousands of past examples to predict the likelihood of a goal.
When unusual events lead to negative xG
While individual xG values range from 0 to 1, a team‘s cumulative xG across a game or season is not so constrained. In theory, unusual events could lead to negative total xG:
- Own goals – Credited as shots for the opposing team, so subtract from a team‘s xG.
- Saved penalties – Removes the high xG penalty shot from the opponent‘s total.
- Shot-saving blocks – Some models treat blocks as negative shots.
Consider this imaginary scenario:
- Team A takes 10 long range shots worth 0.05 xG each → 0.5 xG
- Team B has no normal shots, but scores 1 own goal → -0.05 xG
- Team B‘s defender also makes 5 shot-saving blocks → -0.25 xG
Team B would end with -0.3 total xG even though they scored a goal! Again, this would be quite a bizarre circumstance in real life.
Interpreting negative expected goals
A negative xG differential over a season would be highly abnormal and likely indicates flaws in the model or inputs rather than team performance.
However, single match xG has more noise, and unusual events like own goals do happen. In 99% of cases, negative xG probably means the model is limited, not the team was unlucky. But once in a blue moon, the scoreline fundamentally contradicts the balance of play.
As analysts, we must avoid over-interpreting outliers and small samples with any advanced metric. xG truly comes into its own over larger samples of matches where noise averages out.
Comparing Messi and Ronaldo‘s xG and goals
To illustrate the usefulness of xG, let‘s compare Lionel Messi and Cristiano Ronaldo‘s stats over the past 5 seasons:
Player | Goals | xG | xG Difference |
---|---|---|---|
Messi | 296 | 250 | +46 |
Ronaldo | 292 | 223 | +69 |
Both all-time greats exceed expected goals, but Ronaldo has outperformed xG more. This suggests he‘s been the more clinical finisher recently, while Messi creates more chances. xG gives us insight that raw totals alone do not.
Where expected goals excels…and falls short
While xG has proven itself extremely useful, there are still limitations. Here are a few of the most common criticisms:
- Assumes all players have equal finishing skill
- Fails to account for defending player positions
- Individual game xG can be noisy and misleading
Conversely, some strengths of xG:
- Quantifies chance quality rather than just volume
- Helps assess finishing skill and luck
- More predictive of future performance than past goals
- Improves with larger sample sizes
There are still gains to be made, but xG delivers excellent insights already when applied properly.
My own experience using xG for fantasy football
As someone who enjoys fantasy football, I‘ve found expected goals metrics extremely helpful. By combining players‘ xG and xA (expected assists) with subjective factors like opponent strength, I‘ve been able to more accurately predict future hauls.
This season, I drafted James Maddison in my team after seeing his excellent xG and xA numbers, despite his modest goal contributions last season. Sure enough, he‘s already got 7 goals and 2 assists at the halfway mark!
For me, xG has proven itself the most valuable predictive statistic compared to things like past goals and assists. It has really changed my approach to evaluating players.
Ideas to improve xG in the future
Football analytics will continue to evolve, and I‘m excited to see how expected goals models could improve in the future. Here are some ideas that could make xG even more useful:
- Incorporate computer vision tracking data for player positions rather than averages zones.
- Develop models custom-fit for specific leagues, teams, or players.
- Account for different goalkeeper strengths more granularly.
- Expand factors considered like field conditions and match importance.
- Provide live-updating xG feeds to better analyze game flow.
There will always be room for improvement. But I‘m thrilled data-driven insights like xG have gained widespread adoption in football. It allows us to see the game from a richer, more objective perspective.
Conclusion
While oddities like negative xG illustrate some model limitations, expected goals has proven one of the most valuable additions to the football analyst‘s toolkit over the past decade.
Used judiciously over larger samples, it provides tremendous insight into chance quality and finishing that raw totals alone cannot. xG has transformed how experts, punters, and fans evaluate performance. And its potential to improve tactics and decision making has only just begun to be unlocked.
Football analytics still has untapped potential, but expected goals represents one of its biggest breakthroughs so far. I for one can‘t wait to see how the next generation of xG models continues to evolve the beautiful game.
Let me know in the comments if you have any other questions about xG! I‘m always happy to chat more about the intersection of data and football.