A misprint in the "Big data? 🤗 Datasets to the rescue!" chapter of the NLP Course? #767

TopCoder2K · 2024-12-20T06:16:57Z

There is the following code in the "The magic of memory mapping" section:

print(f"Number of files in dataset : {pubmed_dataset.dataset_size}")
size_gb = pubmed_dataset.dataset_size / (1024**3)
print(f"Dataset size (cache file) : {size_gb:.2f} GB")

It seems there should be "Number of bytes in dataset" instead of "Number of files in dataset", since the number of rows is 15 518 009 and dividing pubmed_dataset.dataset_size by 1024**3 suggests measuring information rather than the number of files.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A misprint in the "Big data? 🤗 Datasets to the rescue!" chapter of the NLP Course? #767

A misprint in the "Big data? 🤗 Datasets to the rescue!" chapter of the NLP Course? #767

TopCoder2K commented Dec 20, 2024

A misprint in the "Big data? 🤗 Datasets to the rescue!" chapter of the NLP Course? #767

A misprint in the "Big data? 🤗 Datasets to the rescue!" chapter of the NLP Course? #767

Comments

TopCoder2K commented Dec 20, 2024