Add frame header bytes to MP3 score #601
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, when reading an mp3 file that has no ID3 tags and incorrect file extension, mutagen reports a score of 0 for every filetype. This PR adds binary header detection for MP3 files with no ID3 tags. Even if these files have no tags, we can still query
mutagen.File('filename').info.length
so it is useful to correctly detect files of this type.This PR only supports MPEG-1 and MPEG-2 Layer III, not the nonstandard MPEG-2.5 (though I could add that if there is interest).
The possible first 2 bytes of a frame are: 0xFFF2, 0xFFF3, 0xFFFA, 0xFFFB.
This corresponds to the last and fourth-to-last bit being arbitrary, and all others 1.
Folder containing LAME-encoded MP3s of each type of header: https://drive.google.com/drive/folders/12NiivA0GQKBriE0M5yZEHE9QIeYDt2Zp?usp=sharing
Sources
On mp3-tech.org they describe
11111111111: for the frame sync
10/11: for MPEG Version 1/2
01: for Layer III (this is the 3 in MP3)
0/1: for checksum protected or not
This Wikipedia diagram, though note that it splits the sync word at 12 bits instead of 11 like the last source. In this diagram we may flip both the MPEG version and error protection bits, leading to our four possibilities.
In filetype.py we see a very similar implementation (though they have missed the 0xFFFA variant, the go version of this repo contains an abandoned PR)