Youth soccer: Stay in position!
A web app to track and visualize player movements, prototyped with HTML, CSS, JavaScript, and Python.
One in a series of projects highlighting my progress as a self-taught programmer.
I began teaching myself Python, SQL, and GIS in spring 2023 — starting from zero. I therefore welcome feedback on these projects and review for errors. And I’d be interested in taking a crack at your own data and geospatial questions too. Please get in touch.
* Update, 15 November 2023: I added event categorization to my app (e.g. pass, shot, etc.) and relaunched at https://matchlogger.com .
* Update, 13 October 2023: I improved my web app.
5 October 2023
Background
Comprehensive datasets capturing soccer player movements, passes, shots, and tackles (in all their permutations) are hard to come by, or expensive. While I have messed around with old La Liga-grade data, the sheer amount of AI and/or labor required to collect fresh, quality match data typically puts soccer analytics in the realm of triple-digit subscription fees and “call for pricing” services. After all, logging precisely where players are on a pitch, the myriad things they could be doing (dribbling, one-touch passing, pressuring…), and when they are doing it is as complex and time-consuming as it is necessary for meaningful analytics. Or is it?
The bane of many a youth soccer coach is keeping kids in position. Left midfielders hug the right sideline, defenders stray into the opposition’s box, and, especially in the early years, everyone chases the ball. Professionals are far more disciplined, and squad analysts capture this—and make strategic adjustments—with heat maps that highlight areas of high and low involvement in game action. I made such a map with StatsBomb data from an old Barcelona match, focusing on left midfielder Ousmane Dembélé, who appears to take corner kicks from both sides, and goalkeeper Marc-André ter Stegen. See image at right.
I accomplished this with Python, loading, filtering and plotting data with json, matplotlib, and mplsoccer libraries—and then realized that, as dense as the dataset I wrangled was, I really only needed a bunch of (x, y) points to generate the heat map effect.
Perhaps a homemade soccer analytics tool was within reach.
Project outputs
Here is my son playing striker during a scrimmage at his U9 practice last night:
The heat map matches my recollection: The vast majority of my son’s ball touches and tackles were in the center of the opposition’s half of the field; for kickoffs, and when the other team attacked, he dropped to the bottom of the center circle. The “emerald” color gradation allows that he could have roamed elsewhere on the pitch, but the bright highlights confirm his heavy involvement high on offense.
The data source? This project is actually a two-for-one: My Python script produced the pitch heat map, but I also created a web app to capture the data behind it. Below is a screenshot of the interface for my very first soccer analytics tool, the one-and-only Match Spatial Logger:
It’s just a counter, available on my GitHub page, and should load fine on a phone, tablet, or desktop. When the action kicks off, all that’s required of users is to tap the area of the field corresponding to their target player’s location when s/he partakes in any kind of event (pass, shot, tackle, whatever). Last night I watched my son from the sidelines and tapped 16 times over five minutes, apparently hitting the center counters of the top two rows most frequently. (Those 0s tick over with each tap.)
The “export data” button downloads a csv log of the button taps to the user’s device that I can feed into my Python script to produce a genuine Match Spatial Logger Player Match Involvement Pitch Heat Map, trademark pending.
While my pre-scrimmage product testing worked well, no MVP, naturally, survives first contact with youth soccer. Besides some UX issues (e.g. tap too quickly and your iPhone screen zooms), it turns out I only tapped central buttons during my son’s stint at striker, and while that means he likely held his position well, it rendered the kernel density estimation behind the heat map useless—all x values in my list of (x, y) points were the same, limiting the spatial estimation to the y axis; if my initial map included a razor-thin vertical line, I didn’t even see it.
To fix, I (1) re-coded one center-of-box entry to the upper-right portion of the pitch (i.e. I doctored the data, though my son’s touches did lean to the right side of the box), and (2) added mild jittering, noise, to all x and y coordinates (a somewhat legit statistical technique). With those tweaks to the csv and Python script, I produced the fine heat map above.
But it’s not that fine, not that granular. Limiting player location to just twelve possible spots across the entire pitch simplifies data collection, but sacrifices heat map fidelity and smoothness. Recalling the nice Dembélé and ter Stegen shapes in my first plots, I wrote another Python script to bin all 228 of Dembélé’s match events into my 12 rectangular areas—effectively employing my rudimentary app to log his match and reducing the number of possible (x, y) combinations from nearly 1 million (if my math and understanding of StatsBomb’s precision is correct) down to 12.
The original heat map, reproduced below on the left, must have required a far more complex data collection process than my 12-point Match Spatial Logger, which yielded the map on the right—it’s splotchy, not smooth. But homemade.
Techniques
For libraries, my heat mapper Python script used pandas, matplotlib, and, crucially, mplsoccer. The latter, available here, made pitch imaging a breeze with plenty of customization options. The above heat maps are more accurately event distribution plots using kernel density estimation (KDE), and mplsoccer has sample code here.
As for the web app, ChatGPT greatly helped with the bones, drafting index.html, script.js, and styles.css files and then walking me through the customizations (which I in turn, or first in turn, had to help it understand). It was my first taste of HTML, JavaScript, and CSS in a very, very long time. The app is hosted through GitHub.
Data sources
Besides StatsBomb, I collected the data for this project using, obviously, my spatial logger web app and my own bare hands.