This package includes efficient python implementation of the following metrics
- BLEU
- reference
implementation: nltk.translate.bleu_score
- error: < 1%
- speed: + 189%
- reference
implementation: nltk.translate.bleu_score
- ROUGE (in progress)
- METEOR (in progress)
- CIDEr/CIDEr-D
- reference implementation: https://github.com/vrama91/cider
- with the same tokenizer
- error: < 1 %
- speed: + 81 %
- with different tokenizers (FormEval use Regexp by default)
- error: ~ 15 %
- speed: + 332 %
- with the same tokenizer
- reference implementation: https://github.com/vrama91/cider
- SPICE
- placeholder wrapper of reference
implementation: https://github.com/tylin/coco-caption/tree/master/pycocoevalcap/spice
- TODO: python scene graph parser
- placeholder wrapper of reference
implementation: https://github.com/tylin/coco-caption/tree/master/pycocoevalcap/spice
*All stats shown above are estimations
- python 3.6 +
- nltk 3.5+
pip install formeval
python -m formeval.setup