Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow additional OCR languages #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions paperless-ngx/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 1.6.0-1
- Expose functionality to install additional OCR languages

## 1.6.0-0
- Swap to, and upgrade to paperless-ngx v1.6.0

Expand Down
11 changes: 10 additions & 1 deletion paperless-ngx/DOCS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ Another way is to make a copy of the `data` and `media` directories.
filename:
format: "{created_year}/{correspondent}/{title}"
ocr:
language: eng
language: eng+swe
languages: swe
default_superuser:
username: admin
email: [email protected]
Expand All @@ -51,6 +52,14 @@ Can be `eng`, `deu`, `fra`, `ita`, `spa`.
This can be a combination of multiple languages such as deu+eng, in which case tesseract will use whatever language matches best.
[Docs](https://paperless-ngx.readthedocs.io/en/latest/configuration.html#ocr-settings)

### Option: `ocr.languages`

Extra languages to install, space separated list.

e.g. `swe lat`

Available languages are documented in here: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

### Option: `default_superuser`

When the addon starts up, if this user is not created, it will create it.
8 changes: 5 additions & 3 deletions paperless-ngx/config.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "Paperless-ngx",
"version": "1.6.0-0",
"version": "1.6.0-1",
"slug": "paperless",
"url": "https://github.com/paperless-ngx/paperless-ngx",
"description": "Paperless is an application that manages your personal documents. With the help of a document scanner, paperless transforms your wieldy physical document binders into a searchable archive and provides many utilities for finding and managing your documents.",
Expand All @@ -22,7 +22,8 @@
"format": "{created_year}/{correspondent}/{title}"
},
"ocr": {
"language": "eng"
"language": "eng",
"languages": null
},
"default_superuser": {
"username": null,
Expand All @@ -35,7 +36,8 @@
"format": "str"
},
"ocr": {
"language": "str"
"language": "str",
"languages": "str?"
},
"default_superuser": {
"username": "str",
Expand Down
3 changes: 2 additions & 1 deletion paperless-ngx/scripts/docker-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ echo "Entry script"
# Load config
export PAPERLESS_FILENAME_FORMAT=$(jq --raw-output ".filename.format" $CONFIG_PATH)
export PAPERLESS_OCR_LANGUAGE=$(jq --raw-output ".ocr.language" $CONFIG_PATH)
export PAPERLESS_OCR_LANGUAGES=$(jq --raw-output ".ocr.languages" $CONFIG_PATH)

export DEFAULT_USERNAME=$(jq --raw-output ".default_superuser.username" $CONFIG_PATH)
export DEFAULT_EMAIL=$(jq --raw-output ".default_superuser.email" $CONFIG_PATH)
Expand Down Expand Up @@ -119,4 +120,4 @@ if [[ "$1" != "/"* ]]; then
else
echo Executing "$@"
exec "$@"
fi
fi