Divots and dashboards
Golfing with a data-driven caddy
Golfing with a data-driven caddy
Data is everywhere, generated at incredible rates, and attached to nearly everything. The key to realizing the potential discoveries lies in data analysis and the communication of findings. Using geographic information systems (GIS), spatial analysis can be applied to nearly any phenomenon, spanning the globe or eighteen holes of golf, to reveal patterns and gain a greater understanding.
A greater understanding of my golf game is precisely what I set out to achieve this summer. I was determined to analyze every shot, every hole, and every game with the hopes of identifying areas of improvement. At the very least, I could establish some baseline metrics to inform any future improvement.
Equipped with a mobile app for data collection, I’d collect the location of each swing I took on the course. Since distance will vary due to the loft, or angle, of the club’s face, it was important to also record the chosen club. Finally, at the end of each hole, I would also record the score documenting how many shots were taken.
As the round progressed, a dataset of shots accumulated. With between seventy and one hundred shots, on a bad day, being taken within a round of eighteen holes the data was ripe with potential. A summer’s worth of rich data beckoned for analysis.
With data collected, it was time for some data processing and analysis. Some preliminary questions related to score, shot distance, and club performance were identified as measures of interest. This initial exploration yielded additional questions and datasets that would guide further analysis, such as directionality.
This processing and analysis yielded additional features and datasets that would contribute to further analysis or reporting. As more and more data is submitted, these datasets will continue to grow and provide more data points for further analysis.
To derive useful measures from the data, I'd need to first process the data points.
This phase involved organizing and sequencing each shot location using the submission date and time.
Once organized, I used the coordinates of sequential points to construct line segments. These lines visualized the path taken from tee to green and I could use them to determine the distance and direction of each shot.
The final phase of processing was to record the score submissions. These were evaluated against par, the anticipated score for each hole, and were aggregated to calculate the score for the round.
The processing phase derived new layers, including a dataset to record rounds played, scores, and shot lines. With these new layers, I could assemble them on a map for visualization.
The map provides a complete visual of all the highs and lows of my games; in front of the clubhouse, and other times, .
As interesting as it is to explore the map, further analysis would yield even more valuable insights into my gameplay and hopefully lead to some potential improvements.
To visualize some of the performance metrics, I compiled the data layers in a dashboard to report on some aggregate statistics. As more rounds are played, these reports will update to include more data points in the trends and will be reflected in the performance metrics. With any luck, my performance might gradually improve over time.
Summary metrics describing the current extent of the dataset.
The cumulative score is the most obvious metric readily measured from this data analysis. In the same way that the score for each hole would be recorded on a physical scorecard, the data has been preserved in a database, safe from becoming lost in the depths of my golf bag.
The number of shots taken, including my numerous penalty strokes, quantifies a simple metric of my overall performance. Visualized in a straightforward table, this information can be compared against par for each hole and the round overall.
However, the real value comes from tracking this performance statistic over time. Monitoring my score relative to par for each course provides a barometer for my progress toward improvement.
Score (over par) | Course | Date |
---|---|---|
83 (+20) | Merry-Hill | June 21, 2023 |
93 (+22) | Rockway | June 29, 2023 |
78 (+16) | Merry-Hill | August 2, 2023 |
A selection of scores collected over several weeks. Some are good, and others are not so good.
The final score is sufficient to indicate overall performance, but the wealth of data can yield much more granular results. Throughout a game, I may make between 70 (I can dream) and 100 shots with a whole suite of clubs. Each shot and the distance achieved holds data that can be used in calculating metrics such as club yardages.
The club yardage metric is incredibly useful as it informs club selection and can be the difference between putting the next shot on the green or looking for your ball in the woods. As data points feed this data-driven statistic over many rounds, the aggregated yardages for each club will become more accurate and useful.
Here's a chart visualizing over five hundred recorded shots.
Visualizing each shot and the club used reveals the distance I can typically achieve with each. Being aware of these yardages means I hopefully pick the right club for the next shot and spend less time in the woods looking for a lost ball.
This chart also holds other patterns that are worth exploring.
For example, some clubs are concentrated around a narrow yardage band, while others like my pitching wedge vary greatly.
However, this isn't necessarily concerning. Reflecting on the data and my experience, I learned to play with a limited set of golf clubs and adapted by learning to use this club in place of a diverse suite of other wedge offerings (ie. sand wedge, gap wedge, approach wedge etc.)
Therefore, I became quite flexible using it in a variety of situations and distances.
The driver, in comparison, is a club with the sole purpose of hitting the ball far. So deviation from a desired distance is, undesirable.
In the case of this club, the distance is much more tightly distributed around a distance of 215 yds with some fortunate and unfortunate outliers on either end.
Based on the courses I play, if I hit my driver well, I can follow up with another shot or two and be on the green. However, on those days when I'm not hitting my driver well, I'm forced to recover with a club that I don't use frequently and have little confidence using.
Hitting it super far doesn't earn you any accolades if your ball lands way off the fairway or deep in the woods. So, let's explore the location in conjunction with the distance of each shot. By mapping and analyzing the spatial distribution of shots, I can identify personal tendencies and pinpoint those areas where I need improvement.
For instance, am I consistently hitting the fairway on the right side but struggling on the left? The spatial analysis of dispersion can reveal this trend and guide ongoing adjustments to my game. As a case study, we can examine a specific example.
Here’s the 9th hole of one of my local courses. I’ve played it several times this summer, and even though I think I’ve lined up well, my ball is consistently landing off the right side of the fairway and occasionally underneath a tree.
Searching for a pattern in these errant shots, I harnessed the power of data processing to imagine that all 87 shots taken off the tee were shot from the same geographic location and aimed at the same target. In relocating my tee shots around a common origin and aiming them relative to a hypothetical fairway, the trajectories begin to tell a story.
It may look like a complete mess, and my game often is, but we might find a pattern in these trajectories by digging a little deeper.
Focusing on one of many variables, we’ll pull out the 39 tee shots hit with a driver.
The fan-like distribution of shots shifts, revealing a directional bias towards the right, and those three drives on hole 9 are no exception.
At first glance, it looks like most of my shots land in this grid cell between 200-250 yds and up to 50 yds to the right of the fairway.
The data can help quantify this inconsistency. We can visualize the distance and deviation from the center of the fairway. When those measures are aggregated, we find that, on average, I tend to shoot 204.7 yards and 12.4 yards to the right in our hypothetical fairway.
These types of analytics can help to identify patterns in my performance. Armed with these metrics, I can approach my next round more informed of what I should be working on, or at the very least, that I should start aiming further to the left.
This story, from the data collection to the analysis and presentation, is just one example of a GIS journey and its varied applications. The data collection, analysis, and presentation (missed shots and all) are laid bare in a visual story communicating the methodology and results.
While the results of this analysis don’t magically fix my golf game, the insights gained do help to better understand patterns in my performance. As I continue to collect data, these metrics will update, hopefully improving, and provide valuable feedback on my progress.
The dashboards below further explore the reported measures of score, yardage, and dispersion patterns for all the games recorded so far. These dashboards dynamically update as new data is submitted, so feel free to explore all the best and worst of my golf games.