Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminator: support multi-byte termination bytes #158

Open
davidhicks opened this issue Apr 24, 2017 · 5 comments
Open

terminator: support multi-byte termination bytes #158

davidhicks opened this issue Apr 24, 2017 · 5 comments

Comments

@davidhicks
Copy link

davidhicks commented Apr 24, 2017

In JPEG Interchange Format (including JFIF and SPIFF), the scan segment includes compressed data for which a length is not known until the compressed data has been fully read from the file. It is possible however to look for a 0xFF byte in the compressed data, which would be followed by 0x00 if this marker is to be ignored (escaped), or another byte (which can be multiple values) to denote the next segment of the file.

Ideally there would be a construct similar to:

- id: compressed_data
  terminator:
    - [0xFF, 0xAA] #next_marker_1
    - [0xFF, 0xBB] #next_marker_2
  consume: false

Wildcard bytes, regular expressions, number ranges and other helpers could also be of assistance in defining terminators in other file formats.

@rodmartin30
Copy link

Assign this to me

@dgutson
Copy link

dgutson commented Mar 14, 2019

I suggest that, instead of supporting multibytes as a terminator, generalize by supporting a rule as a terminator, so a multibyte constant sequence would be a particular case.

@rodmartin30
Copy link

rodmartin30 commented Mar 19, 2019

@GreyCat Can this be a temporal implementation until #538 is specified? If so, please assign this to me, since we need to finish the JPEG.

@rodmartin30
Copy link

rodmartin30 commented Mar 25, 2019

I have been working to handle multi-bytes terminator.

Let's suppose that the changes in Scala to change the type of terminator from int to Array[Byte] are made. (I just replace the int type for Array[Byte] and some minor changes but I would like to write a separate issue about that.)

One good thing to know is KMP algorithm to find matches in O(N + M) where N is the length of the pattern and M is the length of the text. Because of this the complexity of 'read_bytes_term()' doesn't change.

Here is the python-runtime commit with the changes: python-runtime

@StefanRickli
Copy link

Any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants