A Temporal Loom for Models

We see the timeline as physical dimension, of course it is in DL model training LOL!

Installation

Requirements

conda environment (conda>=4.12.0)
python>=3.8

Run the installation script

bash install.sh

Dataset

Create a folder named data after cloning this repository and download the TempLAMA dataset. There are a total of 50,310 samples after the combing of the data.

Analysis

Number of datasamples that have changed over time (unique queries) = 5,823

More are available in the Google Colab

Execution of Code

Combining Data

Use the following script

First command converts the dataset to the dataset of diff format (Contains only the change where the subject changed over the year)
Second command converts the dataset to the csv format - question, answer format

python src/dataset_prep/combine_restructure_data.py --input ./data/train.json,./data/test.json,./data/val.json
python src/dataset_prep/finetuning_data_jsonl_to_csv.py --dataset_path ./data/restructured_data.json --year 2010-2018

Note: You can add more data by seperating the data path in --input parameter by comma.

Paraphrasing of the input query (relation)

The list of the paraphrased relation can be seen in the file utils/templama_relation_rephrase.jsonl for the TempLAMA dataset.

Finetuning dataset

Number of samples in train: 9149 Number of samples in validation: 3000

For zeroshot there are two cases,

Not at all seen dataset, Newly added information in that particular year
Previously seen but the year in the query is changed

python src/dataset_prep/valdata_zeroshot.py --dataset-path ./data/restructured_data.json --val-year 2019-2020

For oneshot there is one case

Sampled from the finetuning dataset - 1000 samples

python src/dataset_prep/valdata_oneshot.py --dataset-path ./data/ft-2010-2018.csv --val-sample 1000

For T5-model: We replace _X_ mask provided in the dataset with <extra_id_0> which is by default mask for the T5 model.
For GPT2 model:

Finetune the model using the code

python src/seq2seq.py --model t5-base --train ./data/ft-2010-2018.csv --val ./data/ft-val-2010-2018.csv --cuda 3

Temsorboard

python -m tensorboard.main --logdir ./logs --port 9000
python -m tensorboard.main --logdir ./results --port 8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Temporal Loom for Models

Installation

Dataset

Analysis

Execution of Code

Combining Data

Paraphrasing of the input query (relation)

Finetuning dataset

Temsorboard

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Temporal Loom for Models

Installation

Dataset

Analysis

Execution of Code

Combining Data

Paraphrasing of the input query (relation)

Finetuning dataset

Temsorboard