Deep Lifelog Data Analysis

Introduction

This is my first serious data analysis project. The project spans from data collection and formatting to presentation. The repo is intended for exhibition and demo hosting only, not for forking or application to other projects. Most of the functionality is data wrangling of very specific non-generalizable idiosyncracies. Individual codebase elements like the custom query language may be adaptable to other purposes.

Since I started grad school in September 2017, I've collected continuous time series tabular data on my daily activities. It started as a small productivity exercise to track my academic tasks, but as my data obsession kicked in the scope ballooned out of control.

I was inspired by r/dataisbeautiful content like this, but I wanted to track way more detail than a single categorical variable over fixed intervals.

Though it was suited fine for the original small productivity exercise, the scope of data features has blown way past what the app that I use for data collection, timesheet.io, was intended for. Most of the juicy information, included that for all of the features listed above, is all collected in a single string entry. So about 90% of this project has been feature engineering: designing data structures to represent one's day-to-day life, parsing that one string, unpacking the mess of shortcuts I've used to streamline collection, and fixing a litany of rookie mistakes.

I implemented a hybrid pandas/OOP framework for the project. My focus in development was on scalability and maintainability over performance. This is because the dataset is small, and I expect to continue adding features over time.

Where's the Data?

Separating all sensitive information from the public codebase has been one of the challenges of the project. Due to the sensitive nature of the data, only a limited amount of data is shared in this repo:

Small sample of raw data
- RAW_CSVs/sample_RAW_PUBLIC.csv
- Be aware that all the raw data is collected in Spanish, which was it's own challenge
Intermediate Catalog yaml files for all implemented Collectible leaf classes except for Person and Location -catalogs/Catalog_*.yaml
Graphics used by the Demo
- VS_Persistent/figs_PUBL.pkl

Data Features

Raw data features:

Time interval
Project
Tags
Description
Mood

Implemented in feature engineering:

Media (TVShow, Movie, Podcast, Audiobook)
Person
Food

Data is logged, but further feature engineering needed:

Media (Software, VideoGame, TabletopGame)
Location
SubjectMatter: topics discussed and researched

Codebase Features

Custom query language built with antlr for readable and maintainable query and update statements
- tsqparser/independent.tsqy: Primary query file used for data cleaning
- tsqparser/TimesheetQuery.g4: Grammar file
Collectible class hierarchy structuring shared features of Audiobook, Podcast, Food, and Person classes, among others
HierarchicalEnum an Enum-like abstract base class where enum members are organized in a hierarchy
- Implemented via pure python class inheritance to represent any directed acyclic graph hierarchy
- Ex: Genre, SocialGroup, and SubjectMatter
Internationalization and localization (partially implemented) to translate Spanish raw and Enum data into English.
Key libraries:
- pandas
- numpy
- antlr
- yamlable
- matplotlib
- streamlit

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.streamlit		.streamlit
Raw_CSVs		Raw_CSVs
VS_Persistent		VS_Persistent
catalogs		catalogs
external_modules		external_modules
i18n_l10n		i18n_l10n
pages		pages
scripts		scripts
src		src
tests		tests
tsqparser		tsqparser
utils		utils
.gitignore		.gitignore
00_🏠_Home.py		00_🏠_Home.py
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Lifelog Data Analysis

Introduction

Where's the Data?

Data Features

Codebase Features

About

Releases

Packages

Languages

aaron-sandoval/lifelog

Folders and files

Latest commit

History

Repository files navigation

Deep Lifelog Data Analysis

Introduction

Where's the Data?

Data Features

Codebase Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages