Diff based culling #3

maxnth · 2021-07-19T14:14:14Z

Use case

When correcting ground truth in big datasets it's often useful to check the diff between a very good prediction and the ground truth in LAREX and correct it if necessary. Culling the correction data set of all files which don't contain any diff between prediction and ground truth makes this a lot easier.

Implementation

The CLI should accept:

a list of PAGE XML files and two indices (for TextEquiv/@index) with denominate prediction and ground truth
two lists of files with one index each in case GT and Pred are stored in two different XML files
whether to apply Unicode normalization / regularization
An output directory

The text was updated successfully, but these errors were encountered:

maxnth added the enhancement label Jul 19, 2021

maxnth added Type: Feature Priority: Medium and removed enhancement labels Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diff based culling #3

Diff based culling #3

maxnth commented Jul 19, 2021

Diff based culling #3

Diff based culling #3

Comments

maxnth commented Jul 19, 2021

Use case

Implementation