Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frame header bytes to MP3 score #601

Merged
merged 2 commits into from
Mar 22, 2023
Merged

Add frame header bytes to MP3 score #601

merged 2 commits into from
Mar 22, 2023

Conversation

Cephian
Copy link
Contributor

@Cephian Cephian commented Mar 16, 2023

Currently, when reading an mp3 file that has no ID3 tags and incorrect file extension, mutagen reports a score of 0 for every filetype. This PR adds binary header detection for MP3 files with no ID3 tags. Even if these files have no tags, we can still query mutagen.File('filename').info.length so it is useful to correctly detect files of this type.

This PR only supports MPEG-1 and MPEG-2 Layer III, not the nonstandard MPEG-2.5 (though I could add that if there is interest).

The possible first 2 bytes of a frame are: 0xFFF2, 0xFFF3, 0xFFFA, 0xFFFB.
This corresponds to the last and fourth-to-last bit being arbitrary, and all others 1.

Folder containing LAME-encoded MP3s of each type of header: https://drive.google.com/drive/folders/12NiivA0GQKBriE0M5yZEHE9QIeYDt2Zp?usp=sharing

Sources

On mp3-tech.org they describe
11111111111: for the frame sync
10/11: for MPEG Version 1/2
01: for Layer III (this is the 3 in MP3)
0/1: for checksum protected or not

This Wikipedia diagram, though note that it splits the sync word at 12 bits instead of 11 like the last source. In this diagram we may flip both the MPEG version and error protection bits, leading to our four possibilities.

In filetype.py we see a very similar implementation (though they have missed the 0xFFFA variant, the go version of this repo contains an abandoned PR)

@phw
Copy link
Collaborator

phw commented Mar 16, 2023

Thanks for this. That makes a lot of sense to me.

It would be great to have a test for this, though. Could you add a test? I think https://github.com/quodlibet/mutagen/blob/master/tests/test___init__.py#L471 would be the proper place to this.

Essentially we need an empty MP3 test file without ID3 tags and check whether it gets properly loaded as a MP3.

@Cephian
Copy link
Contributor Author

Cephian commented Mar 16, 2023

Essentially we need an empty MP3 test file without ID3 tags and check whether it gets properly loaded as a MP3.

I added a file no-tags.mp3 which lives up to its name and put it into _FILETYPES, let me know if you had something else in mind.

Copy link
Collaborator

@phw phw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants