Cleaning Catapult Data
A Simple Script to Clean Catapult Data
The Problem
When working with raw catapult GPS or IMU data, you need to handle the first few lines of meta-data which includes info like date, time, and location of the session. This informaiton is less structured than you would normally find, being written in point-form. Below it are properly formatted variables and their subsequent observations.
Enjoying this post and want to know when new ones come out? Subscribe at newsletters.midsprint.io
Getting Started
Step 1 is to load your packages and understand the data format.
Packages Loaded: tidyverse
for clean code & janitor
to handle variable names
As you can see, it’ll be much more efficient to remove the first 8 rows rather than work around them.
The skip
Argument
read_csv()
from the readr
package (loaded with the tidyverse
package) has a skip
argument that skips any number of rows before returning the data set.
Skip the first 8 rows to return a data set that is easier to work with.
Easily Clean Variable Names
The janitor
package can be used to handle variable names. R isn’t a fan of names with spaces like Heart Rate. The using janitor::clean_names()
replaces all spaces with an underscore and converts all upercase text to lowercase. The end result is a simple variable name that R can handle, and that you don’t have to worry about which letters are capitalized.
If you enjoyed this post and want to know when new ones come out, subscribe at newsletters.midsprint.io
Do you want to work together? Email me directly at aaron@midsprint.io