Using big data to analyze soccer
Complex networks expert shares his all-star 2018 World Cup team
Engineering professor Luís Amaral has investigated complex social and structural networks in areas ranging from healthcare and biology to gender discrimination and gun violence. His diverse research interests and innate curiosity eventually led him to study soccer — his favorite sport.
Amaral used his knowledge of network complexity to create an algorithm that objectively ranks professional soccer players. With the help of students in his lab, Amaral built a network for each team, which reflected who passed the ball to whom, how accurate those passes were, and how likely those passes were to end in a goal.
Before Amaral’s algorithm, the only way to identify stellar soccer players was by listening to sports pundits. Amaral’s lab developed the first objective, data-driven system for understanding who to watch on soccer fields across the globe.
Using sophisticated coding techniques and analytical tools, Amaral’s team created what they termed an “Average Footballer Rating” (AFR) for each player, based on how influential they are in soccer matches. Taken together, the AFR values of all players on a given team indicate that team’s strength — its success at making passes that result in goals. For people who follow soccer, the top three players – Lionel Messi, Neymar Jr., Cristiano Ronaldo — are no surprise.
“A player with an AFR greater than 70 is pretty much superhuman,” Amaral says. “And an AFR above 70 over many seasons is god-like.”
In addition to generating AFR values, Amaral’s algorithm produces a network visual for each team, comprised of nodes — one circle for each player — and lines connecting those players. The nodes and lines vary in size — larger circles indicate more influential players, and wider lines reflect stronger connections between players.
Amaral can build similar networks and rating systems for basketball and hockey — so-called “flow sports.” Football and baseball, he says, are too static and prescribed; you can determine probabilities based on how many yards a team needs to get a first down or who’s on first.
Visualization of a match between Italy and Spain.
But in flow sports, which are more complex, “things are always in motion. Players move all over. There are no breaks in play when the clock is running.”
When Amaral used the AFR values to evaluate the Euro Cup teams in 2008, his method showed Spain was the strongest team and Xavier "Xavi" Hernández Creus, a player on Spain’s team that year, was statistically the best player.
Spain won the Euro Cup in 2008, and the Union of European Football Associations named Xavi the Player of the Tournament.
With the 2018 World Cup just weeks away, Amaral and his collaborator Jordi Duch used this algorithm to build the “World Cup Dream Team”: a May 2018 snapshot of the 10 best outfield players — who will compete in the World Cup — at their respective positions.
World Cup: Wait and see
So, can AFR scores be used to predict today who will win the upcoming World Cup? Not so fast, Amaral says.
“When you are estimating the winner of a World Cup, luck and chance play a huge role,” he says, much more than in the larger sample size of a long season.
What Amaral’s system can do is determine probabilities.
“Our algorithm tells us what the likelihood of outcomes are,” Amaral says. “It’s the difference, for instance, between asking what’s the likelihood that certain teams are going to make the playoffs, versus winning the Super Bowl.”
Lionel Messi, Neymar Jr. and Cristiano Ronaldo all have AFR scores of 73. Based on those values, we can’t say Argentina, Brazil or Portugal will win the tournament, but we do know which players — and teams — to watch at the World Cup.
Learn more
Bring Amaral’s Dream Team to your World Cup watch party:
Download a high-resolution PDF here
Who are the best 18 players in the world today?
View Amaral’s AFR list
Luís Amaral
Luís Amaral is the Erastus Otis Haven Professor of Chemical and Biological Engineering in the McCormick School of Engineering and Applied Science. He is also the co-director of the Northwestern Institute on Complex Systems (NICO), and a fellow of both the American Physical Society and the American Association for the Advancement of Science.
Published: May 08, 2018. Updated: January 28, 2019.
Back to top