You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for migrating this gem to use Tika and replaced mimemagic gem, we're using the latest gem version on production and so far so good, great work, thank you for your hard work!
We just figured out that some certain xlsx and docx files which are uploaded from our users are being miss-detected as application/zip, same as this issue #35
But it only happen with some files that have a size larger than 64kb
The root cause of 3rd case is it's failed when executing a matching comparison for [Content_Types].xml with offset is 30:65536 while Google Docs/sheets have the fingerprint items at the end of the file.
Can we implement a negative offset to read from the end of the file for these cases?
The text was updated successfully, but these errors were encountered:
I looked into the negative offset approach but I'm not seeing consistent patterns of the placement of [Content_Types].xml in hex dumped files... It seems it can appear at the beginning or end of the file (see the two files in the repo). We may have to scan the entire file for this pattern, but I don't see another example of that in the DB.
Hiello @nvh0412@gmcgibbon, do you finally find solution/workaround for this?
I also got issue in detecting docx file, mine is 1.5 MB, always detected as application/zip. The file is exported from google docs
while if I tried to check for file with smaller size, it's working just fine, able to get application/vnd.openxmlformats-officedocument.wordprocessingml.document
*update
If I change how I call Marcel by supplying the name argument as well from
Marcel::MimeType.for(docx)
into
Marcel::MimeType.for(docx, name: docx_path)
I'm able to get application/vnd.openxmlformats-officedocument.wordprocessingml.document instead of application/zip
with above changes does it mean I can still safely detect if a file is an actual docx? as I'm using Marcel to reject file with non-whitelisted mime types
Hi team,
Thanks for migrating this gem to use Tika and replaced mimemagic gem, we're using the latest gem version on production and so far so good, great work, thank you for your hard work!
We just figured out that some certain
xlsx
anddocx
files which are uploaded from our users are being miss-detected asapplication/zip
, same as this issue #35But it only happen with some files that have a size larger than 64kb
Summary:
There were 3 xlsx files:
The root cause of 3rd case is it's failed when executing a matching comparison for
[Content_Types].xml
with offset is30:65536
while Google Docs/sheets have the fingerprint items at the end of the file.Can we implement a negative offset to read from the end of the file for these cases?
The text was updated successfully, but these errors were encountered: