Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German Language Detection #1233

Open
PhilippJindraBS opened this issue Dec 9, 2024 · 1 comment
Open

German Language Detection #1233

PhilippJindraBS opened this issue Dec 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@PhilippJindraBS
Copy link

Description of the bug | 错误描述

Hallo,
First of all thank you for MinerU.

BUT the german language detection is miserable.

The German "Umlaute" are not recognized ... like ö, ü, ä, ß.
I have set the language to "german" but nothing really changed.

For exmaple:
This is in the text: Köln Dünnwald
This is the output: K8ln Dünnwald

In addtion:
this is in the text: Köln-Dünnwald
This is the output: Koln-Dunnwald

Maybe you have a solution for this problem.

Thank you!

How to reproduce the bug | 如何复现

You can try it with any document that has the german language.

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.10.x

Device mode | 设备模式

cuda

@PhilippJindraBS PhilippJindraBS added the bug Something isn't working label Dec 9, 2024
@myhloli
Copy link
Collaborator

myhloli commented Dec 9, 2024

Clearly, PaddleOCR does not perform well in scenarios other than Chinese and English. We plan to incorporate additional OCR methods in the future to improve recognition quality for non-Chinese and non-English texts. However, due to limited development resources, this process may take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants