Facilitate easy division of dataset into training and validation sets #173

valedan · 2023-04-12T20:13:09Z

We want to make sure we're not evaluating the model on the training set because we're worried about memorization.

mivanit · 2023-06-16T05:51:06Z

the way to do this would be to use the filtering mechanics from #177. specifying a new validation dataset or number of items to sample from the testing dataset is now possible as of #181 by using TrainConfig.validation_dataset_cfg

traeuker · 2023-06-28T08:53:48Z

Does that mean this issue gets closure? Or do we want to have a better solution for this?

mivanit · 2023-08-06T21:39:16Z

I'm going to mark this as closed for my own peace of mind, it seems to work well enough for now. Happy to re-open if it is unsufficient.

mivanit closed this as completed Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facilitate easy division of dataset into training and validation sets #173

Facilitate easy division of dataset into training and validation sets #173

valedan commented Apr 12, 2023 •

edited

Loading

mivanit commented Jun 16, 2023

traeuker commented Jun 28, 2023

mivanit commented Aug 6, 2023

Facilitate easy division of dataset into training and validation sets #173

Facilitate easy division of dataset into training and validation sets #173

Comments

valedan commented Apr 12, 2023 • edited Loading

mivanit commented Jun 16, 2023

traeuker commented Jun 28, 2023

mivanit commented Aug 6, 2023

valedan commented Apr 12, 2023 •

edited

Loading