A collaborative filtering music recommender built on the LFM-2b dataset.
The actual implementation uses a type of Matrix Factorization that accepts implicit data (such as the amount of times a user has listened to a track) as described by Yifan Hu et al. This was chosen as the dataset lists listening events, not actual ratings of tracks.
The main bits of code are described in four files:
src/sawatuma/model.py
: the matrix factorization modelsrc/sawatuma/datasets.py
: the code responsible for downloading, extracting, and portioning the datasrc/__main__.py
: describes the main parameters of the modelsrc/tui.py
: the basic tui that comes with the model
This project requires pdm and maturin, so install them first.
First, clone the reposistory:
git clone https://github.com/baanan/sawatuma.git
cd sawatuma
Then, install pdm dependencies
pdm install
And compile the rust dependency
pdm venv activate
pdm run python -m ensurepip
cd sawatuma_rs
maturin develop
cd ..
If desired, you may want to download a precompiled model and track mapping from the latest release. If you do, download both of these files into a folder named data
off of the project root.
mkdir data; cd data
wget https://github.com/baanan/sawatuma/releases/latest/download/model.pickle
wget https://github.com/baanan/sawatuma/releases/latest/download/track_mapping.tsv
cd ..
Then, run the __main__
file using pdm (this may take a very long time!):
pdm run src/__main__.py