My effort into improving handwriting removal throught the new DIS (Dichotomous Image Segmentation)
- Clone the DIS github:
git clone https://github.com/xuebinqin/DIS
-
Install the requirements via
pip install -r requirements.txt
-
Download the
isnet.pth
file from my huggingface model repository and move it into the cloned DIS folder. -
Replace
Inference.py
in the cloned DIS folder to theInference.py
of this repository. -
Change the paths according to your own application (Evaluation data path may be different).
AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting.
This is a link to his repo: https://github.com/AndSonder/HandWritingEraser-Pytorch
HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red.
The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset.
Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset.
The relavent contents of the repo is listed:
|- train.zip
|- val.zip
After unzipping train.zip and val.zip, the file tree should look like:
|-train
| |-gt
| | |- dehw_train_00714.png
| | |- dehw_train_00715.png
| | ...
| |-im
| | |- dehw_train_00714.jpg
| | |- dehw_train_00715.jpg
|-val
| |-gt
| | |- dehw_train_00000.png
| | |- dehw_train_00001.png
| | ...
| |-im
| | |- dehw_train_00000.png
| | |- dehw_train_00001.png
the gt
folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data).
the im
folder is the normal image of the handwriting dataset.
The code that was used to generate the dataset in the Huggingface Repo is create_masks.py
I used the train_valid_inference_main.py
from DIS with my own dataset and training batch size.
You can scale the batch size up if you have enough memory.
- Clone the DIS github:
git clone https://github.com/xuebinqin/DIS
-
Install the requirements via
pip install -r requirements.txt
-
Replace the
train_valid_inference_main.py
from the cloned DIS folder with thetrain_valid_inference_main.py
from this repository. -
Adjust the dataset paths and hyperparameters accordingly.
Iteration | train Loss | train TarLoss | val Loss | val TarLoss | maxF1 | mae |
---|---|---|---|---|---|---|
11700 | 0.3089 | 0.019 | 0.3067 | 0.023 | 0.8982 | 0.0092 |
If you need any help, create an issue to this repository.
-
Provide system information and a basic file folder layout (can be a screenshot, or just a file tree)
-
Provide error message.
-
Provide which file produced this error message.