I love receiving feedback from users of this website, regardless of whether it's positive or negative. It's great to hear how basketball fans have been interacting with the data and tools and how you're using the information to improve your understanding and insights into the game. This is also a good chance for me to give a big shoutout too to those who have done a bit of quality checking on some of the data being served up - the site wouldn't be where it is today without your help, so thanks!
One of the most common questions I receive is around how the visualisations on the site are created, so I thought I'd go ahead and give a little background on how things are done here at SpatialJam, for those interested.
For me, the most important dataset on SpatialJam is the NBL Shot Machine, which holds all the shooting data. It was the original reason the site was set up, why the site was named SpatialJam and really my favourite visualisation to play around with. So for this first behind-the-scenes type blog, I'll be focusing on how this was created and maintained.
I'd like to preface this by saying this may not be the best, or most efficient way of producing these tools, but for me it's an ever evolving process and as much a learning and experimenting process personally as it is about providing the data out to the end users. I'd love to hear your thoughts if you have ways the process can be improved.
For years I'd been contemplating the idea of creating a shot database for the NBL. It's been done to death in the NBA, with analysts like Kirk Goldsberry (CourtVision Analytics and formerly Grantland) revolutionising and really bringing basketball shooting data into the mainstream over the past few years. The problem with the creating these graphics for the NBL is the access to the data, as we don't have the same databases, tools or API's at our disposal down-under.
Shot data access is something I'd been trying to figure out for a few years, even to the point of attempting to manually track the data myself (with the help of a friend) using Geospatial software. We used ArcGIS as a platform for databasing all the spatial data we could during a basketball game. This solution did work and was the basis of my work during the 2012-13 NBL season where I collected all shots, rebounds and turnover locations and provided reports to the New Zealand Breakers as part of an overall stats package. An example of some of the turnover data collected is seen below:
It's a massive job though - as I'm sure you can imagine and pretty soon I got tired of logging every game. The project was put on hold for a season while I looked into how best to automate this process (sadly no SportsVu for the NBL any time in the near future).
The above is an example of the JSON data available for tracking the NBL shooting. It's the script behind what you see on the FIBA LiveStats platform online and controls how each shot is displayed on the shot chart section. You'll notice each line has a bunch of attributes, notably the X, Y coordinates of the shot, the player who took the shot and the shot result (r).
There are a ton of ways to extract the data from these JSON files; you could copy and paste it into Excel and use the string parsing functions to extract the required data (gross), or use Python or a similar language, which can handle JSON to pull the data from the web and push it into a database hosted either locally or in the cloud. SpatialJam uses the later option.
Python has good libraries for processing JSON data natively and can also pull the data from the web without any kind of human interaction required (such as opening a web browser). Once I got my head around the FIBA naming conventions for the JSON tags the data was easily extracted and pushed into a simple .csv file
There are a few nuances involved in making the data from FIBA actually usable in the way you see it on SpatialJam at the moment; Firstly FIBA record data at both ends of the court when tracked live, meaning a team will have shot locations mapped to different ends depending on the half:
To make the data useful, I first needed to flip the second half data so it would appear on the same end as the first half. This is done by scripting in a rule that flips the X and Y values of any point over a particular threshold to match the same relative location on the other end of the court. The reverse was also required for the second team playing, as their data was recorded 'correctly' in the second half but not in the first.
The second piece of data manipulation was required on all shots tagged as Dunks or Tip Ins. I wanted these shots centred on the rim itself, but unfortunately the human input (point and click) nature of the FIBA stat tracker means that in many cases this doesn't happen. In actual fact, tip ins are almost always shown as errors on the FIBA shot charts. On the image above, you'll notice a tiny blue dot in the bottom left corner of the court. Yup, that's all the shots tagged as tip ins. Again, this was a simple as scripting in a rule to take these shots and automatically adjust their X, Y values to coincide with the X, Y value of the hoop.
There are a number of other NBL errors which the SpatialJam script accounts for, such as misspelling of players names, incorrect or inconsistent player numbers and overtime periods, which I won't go into detail on, but suffice to say - I'm hoping the data you see on SpatialJam is an 'cleaner' version of that served up by the NBL at this stage.
Tableau Public is a free data visualisation tool which is quite frankly brilliant in many regards. There is a crazy amount of depth to the software and something I'm still very much in the learning stages of. The NBL Shot Machine is made up of a number of Tableau worksheets pieced together to form the overall graphic. The main one of these being the court visualisation itself, which is basically just an X, Y chart with basketball court lines mapped underneath. Tableau allows the user to interact and query the chart unlike 'old-school' shot charts which are just images. This creates a huge range of freedom and granularity for the end user when searching for information.
Due to the input limitations of the free Tableau interface, I can not yet have the data stored on a server or even a locally hosted SQLite database as I would like, so at this stage Tableau is reading the data directly from the .csv file created by SpatialJam's python script. The site does not yet therefore, have the ability to live update the data as it happens which means it still requires me to 'push' the data through to the Tableau before the fresh data appears on SpatialJam. Doing this only takes a few seconds, but it is something I would like to address in the future. Unfortunately, at this stage the solution to this costs money, and I believe free access to the site is more important than being a couple of minutes quicker at updating. I don't think anyone's that desperate to get their hands on this stuff...
One part of the process which is now automated however is the running of the scripts to pull the data. The SpatialJam python scripts are located on an internet connected Raspberry Pi unit running on Linux and are scheduled to run in the early hours of each morning to grab the latest NBL game data. The newly grabbed data is then sent to be merged with the main dataset ready for me to (manually...) push up to the Tableau servers and SpatialJam. This isn't really that much of a time saver, but it's a bit of geeky side note anyway.
The NBL Shot Machine is an ever evolving project for me. I've loved having your feedback on how you use the visualisation and improvements you'd like to see in the future. As the project develops over time I am aiming to expand the data to include the SEABL and NZ NBL shooting data for the upcoming seasons as well as fine tune the querying functionality on the site. I hope you enjoy using it as much as I've enjoyed building it!