Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCX not being converted in docker? #243

Closed
philipbkemp opened this issue Jan 2, 2025 · 5 comments
Closed

DOCX not being converted in docker? #243

philipbkemp opened this issue Jan 2, 2025 · 5 comments

Comments

@philipbkemp
Copy link

I have a Word Document that I needed to convert, and was informed about this project to help speed up the process. But after 20 minutes it seems like nothing happened. So I ran a test, by creating a new word document (called simple.docx) with the following content as a paragraph. There are no titles, no links, no images - nothing complicated.

This is a blank document. One paragraph. Please convert to markdown.

Then I ran this command:

docker run --rm -i markitdown:latest < simple.docx > simple.md

It's been 20 minutes, and I still don't have anything in the "simple.md" file. The Word file is a mighty 13KB, so I was expecting this to be fairly quick compared to a document with things like titles, links, and images. The MD file exists, but it's 0KB and empty.

Can someone advise how long the conversion normally takes? Or if there are any issues with the docker based implementation of this tool? I tried looking inside the container for some useful logs, but I couldn't find anything - perhaps there is a file that can shed some light on what (if anything) is going on?

Thanks.

@l-lumin
Copy link
Contributor

l-lumin commented Jan 2, 2025

Hi, this issue is related to PR #173 . I ran into this on my own.

~: docker run --rm -v ${pwd}:/src --workdir /src markitdown s
imple.docx


This is a blank document. One paragraph. Please convert to markdown.

@philipbkemp
Copy link
Author

Thanks @l-lumin - but I don't see the generated .md file anywhere in the folder.

I'm running that command from the markitdown folder, which contains the Dockerfile and my simple.docx file - so I would have expected it to be there, or in the /src folder - but nothing. Any ideas where my markdown is?

@l-lumin
Copy link
Contributor

l-lumin commented Jan 2, 2025

@philipbkemp
If you run my above command, it won't export to an makrdown file by default. You can export it by adding -o simple.md or > simple.md, which will create an markdown file in the directory where you run the command

@philipbkemp
Copy link
Author

Thanks @l-lumin, I got the file using > simple.md at the end because when using -o simple.md I got this error (maybe my arguments were in the wrong order?):
Image

Either way, I got it working and a MD file with this:
docker run --rm --volume ${pwd}:/src --workdir /src markitdown:latest ./simple.docx > simple.md

Thanks for the assistance.

@l-lumin
Copy link
Contributor

l-lumin commented Jan 2, 2025

Glad to help you. I try this command and it worked.

docker run --rm -v ${pwd}:/src --workdir /src markitdown simple.docx -o simple.md

Thank you for your detailed report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants