In this post I wanted to give a quick bit of insight into how I've gone about creating SpatialJam's Lineup-O-Matic.
The Lineup-O-Matic essentially displays all the five man combinations used by a team and calculates the statistics accumulated for and against while they are on the court together. This is a useful metric for coaches in assessing how well a particular group of players work together and in which in-game situations a player has historically been well-suited.
As with any metric, it's a measure of what has happened in the past, rather than being predictive - but lineup data can be used to aid decision making and help coaches have a 'best-guess' at what may happen.
In the past I have measured line ups for teams in the NBL as part of statistics packages provided to coaches, but until last year this has been a tricky process and relied heavily on time consuming, manual data entry work. The Lineup-O-Matic on the other hand is fully automated and able to process a whole season of data in just a few seconds. This makes it a much more feasible approach in providing these metrics freely and to a wide audience.
As with the Shot Machine (see Blog Post One), the Lineup-O-Matic is built using a combination of Python (to pull and collate the data) and Tableau (to visualise the data) and is actually pretty simple once you get your head around the logic involved.
The script I've created for SpatialJam essentially creates a single row in a database for every event that occurs during a game, be it shot being made, a turnover or a shot clock violation etc. Regardless of the event, the row of data also contains the 10 players that are on the court at the time of it occurring, 5 on Team 1 and 5 on Team 2. Below is an example of how a couple of minutes of (very messy) game time looks in the SpatialJam database between Cairns and Perth last season:
You'll see regardless of which team causes the event, all 10 players on the court are still attributed to it. Also note how when Shaun Bruce is subbed out at the 3:48 mark for Markel Starks the data reflects this from that point onwards.
Creating the lists of events is simple, python just creates a new row for each new JSON event, the trick is adding in the players who are on court at the time. To create this data, python creates a list of players based on the substitution rows created in the Play by Play feed. It will look for any event that has been tagged with substitution and add a player to the 'on court' list based on what it finds.
For example at the start of every game, the starting five players are subbed onto the court:
Python creates two lists, one for the home team and one for the away team containing the five initial players subbed onto the court. From this point onwards it can add and subtract from this list as players leave and enter the court and every event along the way has these two lists of players attached to it.
This creates a very rich Play by Play database, not just for creating lineup data, but also other metrics that are not usually visible in a boxscore, such as a richer assist dataset showing who is passing to whom (as seen here) or a more detailed record of turnovers (travels, ball-handling errors etc), charges drawn and other statistics not usually provided by the league's official statistics.
The next step for SpatialJam is linking this Play by Play database to the Shot Machine, which will give a deeper understanding of the game. For example, users will be able to see not just where the shot happened, but who else was on the court at the time, who was credited with the assist and in the case of a missed shot, who rebounded the ball.
I don't believe The Lineup-O-Matic has reached it's full potential yet either. There are still a few issues with the UI I'm not completely happy with, such as the player drop down lists and some of the refresh speeds for the data. Still, this type of dataset is a first for the NBL and one I hope will become a powerful tool for coaches and fans alike in assessing the game Downunder.