Cleaning Catapult Data

A Simple Script to Clean Catapult Data

The Problem

When working with raw catapult GPS or IMU data, you need to handle the first few lines of meta-data which includes info like date, time, and location of the session. This informaiton is less structured than you would normally find, being written in point-form. Below it are properly formatted variables and their subsequent observations.

Enjoying this post and want to know when new ones come out? Subscribe at newsletters.midsprint.io

Getting Started

Step 1 is to load your packages and understand the data format.

Packages Loaded: tidyverse for clean code & janitor to handle variable names

As you can see, it’ll be much more efficient to remove the first 8 rows rather than work around them.

The skip Argument

read_csv() from the readr package (loaded with the tidyverse package) has a skip argument that skips any number of rows before returning the data set.

Skip the first 8 rows to return a data set that is easier to work with.

Easily Clean Variable Names

The janitor package can be used to handle variable names. R isn’t a fan of names with spaces like Heart Rate. The using janitor::clean_names() replaces all spaces with an underscore and converts all upercase text to lowercase. The end result is a simple variable name that R can handle, and that you don’t have to worry about which letters are capitalized.

If you enjoyed this post and want to know when new ones come out, subscribe at newsletters.midsprint.io

Do you want to work together? Email me directly at