Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing completely offline use of tesseract.exe resulted in an "Error opening data file"/ eng. traineddata "issue #970

Open
22480 opened this issue Oct 28, 2024 · 0 comments

Comments

@22480
Copy link

22480 commented Oct 28, 2024

Describe the bug:
I want to achieve fully offline use of tesseract.js, so I will:

const recognizeText = async (imageUrl: string) => {
const worker = await Tesseract.createWorker("chi_sim", undefined, {
workerPath: "/tessdata/tesseract.js/dist/worker.min.js",
corePath: "/tessdata/tesseract.js-core",
langPath: "/tessdata/tesseract-lang",
logger: m => console.log(m),
})

    const {
        data: { text },
    } = await worker.recognize(imageUrl)
    setRecognizedText(text)

    await worker.terminate()
}

To Reproduce:
Steps to reproduce the behavior:

  1. Create a tessdata folder in the public folder
  2. Place local resource files in this folder:
    tesseract-lang、tesseract.js、tesseract.js-core
  3. Run

Complete code
const inputRefOCR = useRef(null)
const [imageData, setImageData] = useState("")
const [recognizedText, setRecognizedText] = useState("")

const handleCapture = () => {
    if (inputRefOCR.current.files && inputRefOCR.current.files.length > 0) {
        const file = inputRefOCR.current.files[0]
        const reader = new FileReader()
        reader.onload = e => {
            setImageData(e.target.result)
            recognizeText(e.target.result)
        }
        reader.readAsDataURL(file)
    }
}

const recognizeText = async (imageUrl: string) => {
    const worker = await Tesseract.createWorker( {
        workerPath: "/tessdata/tesseract.js/dist/worker.min.js",
        corePath: "/tessdata/tesseract.js-core",
        langPath: "/tessdata/tesseract-lang",
        logger: m => console.log(m),
    })


    const {
        data: { text },
    } = await worker.recognize(imageUrl)
    setRecognizedText(text)

    await worker.terminate()
}

Console error display:
屏幕截图 2024-10-28 144514

Expected behavior:
Implement fully offline use of Tesseract.js

Device Version:

  • Windows 11
  • chrome
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant