From 5cd79f9cc957cf5648c6acd3e5d2b9bcd54fa0ae Mon Sep 17 00:00:00 2001 From: loubnabnl Date: Sat, 25 Mar 2023 02:12:45 +0000 Subject: [PATCH] add readme --- pii/ner/README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 pii/ner/README.md diff --git a/pii/ner/README.md b/pii/ner/README.md new file mode 100644 index 0000000..c0c8685 --- /dev/null +++ b/pii/ner/README.md @@ -0,0 +1,14 @@ +# Fine-tuning Bigcode-Encoder on an NER task for PII detection + +To run the training on all the dataset `bigcode/pii-full-ds`, use the following command: +```bash +python -m torch.distributed.launch \ + --nproc_per_node number_of_gpus train.py \ + --dataset_name bigcode/pii-full-ds \ + --debug \ + --learning_rate 2e-5 \ + --train_batch_size 8 \ + --bf16 \ + --add_not_curated +``` +Note that we use a global batch size of 64 (8*8 GPUs). To use only curated dataset remove the flag `--add_not_curated`. \ No newline at end of file