Creating the CSV Repositories
In the previous post I outlined the implementation of AbstractRepository
. With this in place I needed to create the concrete repositories to manage interactions with the csv files containing fitness assessment results and conditioning workouts.
Dummy Data
I started by creating two csv files with dummy fitness results and workouts. The first line of the csv files contains headers and is followed by athlete fitness assessment data or workout training variables.
# fitness_assessments.csv
athlete_name,sport,status,date,time_trial_name,time_trial_distance,time_trial_time
John Doe,Boxing,True,2022/05/12,2km time trial,2000,510
John Doe,Boxing,True,2023/06/18,5m flying sprint,5,0.67
Jane Smith,Hockey,True,2022/07/03,2km time trial,2000,460
Jane Smith,Hockey,True,2022/08/22,5m flying sprint,5,0.52
# conditioning_workouts.csv
workout_name,work_interval_time,work_interval_percentage_mas,work_interval_percentage_asr,rest_interval_time,rest_interval_percentage_mas,rest_interval_percentage_asr
Passive Long Intervals - Normal,2,100,0,2,0,0
Passive Long Intervals - Extensive,2,100,0,1,0,0
Passive Long Intervals - Intensive,2,100,0,3,0,0
CsvFitnessProfileRepository
The fitness_assessments.csv file contains multiple records of 2km time trial and 5m flying sprint records for each athlete. I needed to ingest the data and group by athlete name. To group the data I originally planned to use the groupby
function from itertools but went with a defaultdict
from the collections module due to groupby
requiring the data to be sorted (see link). With the data grouped, I needed to find the latest 2km time trial and 5m flying sprint records and use these to create a FitnessProfile object. The _load
method takes care of this functionality:
class CsvFitnessProfileRepository(AbstractRepository[FitnessProfile]):
"""CSV implementation of FitnessProfile repository."""
def __init__(self, filepath: str):
"""Initialise CsvRepository with fitness profiles.
Args:
filepath: The filepath for the fitness assessment CSV files.
"""
self._filepath = Path(filepath)
self._fitness_profiles: dict[str, FitnessProfile] = {}
self._load()
def _load(self):
with open(self._filepath) as f:
reader = csv.reader(f)
next(reader) # skip header row
# group reader by name
athlete_records = defaultdict(list)
for record in reader:
athlete_records[record[0]].append(record)
# find latest 2km and 5m for each athlete
for athlete_name, athlete_results in athlete_records.items():
latest_2km = max(
result
for result in athlete_results
if result[4] == "2km time trial"
)
latest_5m = max(
result
for result in athlete_results
if result[4] == "5m flying sprint"
)
# create FitnessProfile for each athlete
profile = FitnessProfile(
name=athlete_name,
time_trial_distance=int(latest_2km[5]),
sprint_distance=int(latest_5m[5]),
time_trial_time=int(latest_2km[6]),
sprint_time=float(latest_5m[6]),
)
self._fitness_profiles[athlete_name] = profile
With the data ingested and stored as FitnessProfile
objects I need to complete the AbstractRepository
interface by adding the get
and get_all
methods:
class CsvFitnessProfileRepository(AbstractRepository[FitnessProfile]):
"""CSV implementation of FitnessProfile repository."""
def __init__(self, filepath: str):
"""Initialise CsvRepository with fitness profiles.
Args:
filepath: The filepath for the fitness assessment CSV files.
"""
self._filepath = Path(filepath)
self._fitness_profiles: dict[str, FitnessProfile] = {}
self._load()
...
def get(self, id: str) -> FitnessProfile:
"""Get a single entity from the persistence layer."""
return self._fitness_profiles[id]
def get_all(self) -> Sequence[FitnessProfile]:
"""Get a sequence of entities from the persistence layer."""
return list(self._fitness_profiles.values())
The get
method is passed the athlete's name which is used for the key lookup in the _fitness_profiles
attribute to return a single FitnessProfile
. The get_all
method returns all the values from the _fitness_profiles
attribute as a list.
A Quick Note on Keeping Things Simple...
I haven't implemented the add
method as I removed it from the AbstractRepository
. As I'm moving through the project I've realised that I'm trying to implement functionality that I don't currently need to create an MVP. This is slowing down the project and moving my attention away from learning the design patterns. My goal is to create an MVP implemented with the minimum required functionality and the design patterns I'd like to learn. Once I have a working application I can revisit the additional functionality. See issue 8 and issue 9 for previous examples of functionality that I have moved to the backlog.
CsvWorkoutRepository
The conditioning_workouts.csv file contains the training variables for each conditioning workout. Again, I needed to ingest the csv data and create a Workout
object. One nice addition to the class is the _convert_types
method. This provides a cleaner way of converting the data types of each csv file prior to instantiating each Workout
instance. I was introduced to this approach through David Beazley's Practical Programming course. Specifically, see exercise 2.24.
class CsvWorkoutRepository(AbstractRepository[Workout]):
"""CSV implementation of Workout repository."""
def __init__(self, filepath: str):
"""Initialise CsvRepository with workouts.
Args:
filepath: The filepath for the workouts CSV file.
"""
self._file_path = Path(filepath)
self._workouts: dict[str, Workout] = {}
self._load()
def _convert_types(self, record: list[str]) -> list:
types = [str, int, float, float, int, float, float]
return [func(val) for func, val in zip(types, record)]
def _load(self):
with open(self._file_path) as f:
reader = csv.reader(f)
next(reader) # skip header row
for record in reader:
converted = self._convert_types(record)
workout = Workout(*converted)
self._workouts[workout.name] = workout
With the data stored as Workout
objects I implemented get
and get_all
methods. As with the previous repository, the get
method is passed a string containing the workout name to be used as the key in the _workouts
attribute and get_all
returns a list containing all Workout
objects from the _workouts
attribute.
class CsvWorkoutRepository(AbstractRepository[Workout]):
"""CSV implementation of Workout repository."""
def __init__(self, filepath: str):
"""Initialise CsvRepository with workouts.
Args:
filepath: The filepath for the workouts CSV file.
"""
self._file_path = Path(filepath)
self._workouts: dict[str, Workout] = {}
self._load()
...
def get(self, id: str) -> Workout:
"""Get a single entity from the persistence layer."""
return self._workouts[id]
def get_all(self) -> Sequence[Workout]:
"""Get a sequence of entities from the persistence layer."""
return list(self._workouts.values())
With the creation of the two repositories I now have the csv data stored in FitnessProfile
and Workout
objects and available in working memory. The next step is to create a service layer which outlines the use case for the data. For the MVP this will be calculating individual target distances for conditioning workouts.