Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate easy division of dataset into training and validation sets #173

Closed
valedan opened this issue Apr 12, 2023 · 3 comments
Closed

Comments

@valedan
Copy link
Contributor

valedan commented Apr 12, 2023

We want to make sure we're not evaluating the model on the training set because we're worried about memorization.

@mivanit
Copy link
Member

mivanit commented Jun 16, 2023

the way to do this would be to use the filtering mechanics from #177. specifying a new validation dataset or number of items to sample from the testing dataset is now possible as of #181 by using TrainConfig.validation_dataset_cfg

@traeuker
Copy link
Member

Does that mean this issue gets closure? Or do we want to have a better solution for this?

@mivanit
Copy link
Member

mivanit commented Aug 6, 2023

I'm going to mark this as closed for my own peace of mind, it seems to work well enough for now. Happy to re-open if it is unsufficient.

@mivanit mivanit closed this as completed Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants