terminator: support multi-byte termination bytes #158

davidhicks · 2017-04-24T02:14:34Z

In JPEG Interchange Format (including JFIF and SPIFF), the scan segment includes compressed data for which a length is not known until the compressed data has been fully read from the file. It is possible however to look for a 0xFF byte in the compressed data, which would be followed by 0x00 if this marker is to be ignored (escaped), or another byte (which can be multiple values) to denote the next segment of the file.

Ideally there would be a construct similar to:

- id: compressed_data
  terminator:
    - [0xFF, 0xAA] #next_marker_1
    - [0xFF, 0xBB] #next_marker_2
  consume: false

Wildcard bytes, regular expressions, number ranges and other helpers could also be of assistance in defining terminators in other file formats.

The text was updated successfully, but these errors were encountered:

rodmartin30 · 2019-03-14T13:43:40Z

Assign this to me

dgutson · 2019-03-14T14:41:18Z

I suggest that, instead of supporting multibytes as a terminator, generalize by supporting a rule as a terminator, so a multibyte constant sequence would be a particular case.

rodmartin30 · 2019-03-19T14:07:21Z

@GreyCat Can this be a temporal implementation until #538 is specified? If so, please assign this to me, since we need to finish the JPEG.

rodmartin30 · 2019-03-25T15:57:47Z

I have been working to handle multi-bytes terminator.

Let's suppose that the changes in Scala to change the type of terminator from int to Array[Byte] are made. (I just replace the int type for Array[Byte] and some minor changes but I would like to write a separate issue about that.)

One good thing to know is KMP algorithm to find matches in O(N + M) where N is the length of the pattern and M is the length of the text. Because of this the complexity of 'read_bytes_term()' doesn't change.

Here is the python-runtime commit with the changes: python-runtime

StefanRickli · 2022-05-16T17:14:17Z

Any progress on this?

GreyCat added the enhancement label May 1, 2017

dgelessus mentioned this issue Mar 14, 2019

Seek magic values on instances #530

Closed

This was referenced Mar 14, 2019

Problem parsing a buffer containing data from JPEG image #532

Closed

Additional use cases for custom (processing) routine: Searches #410

Open

Scanning (seek, search) for attribute start & end positions #538

Open

dgelessus mentioned this issue Jun 16, 2019

Add PHP serialized value and phar archive format kaitai-io/kaitai_struct_formats#173

Merged

KOLANICH mentioned this issue Jul 21, 2019

QSP format kaitai-io/kaitai_struct_formats#185

Closed

GreyCat mentioned this issue Jul 2, 2020

null termination where the null terminator is a int32 #766

Closed

generalmimon mentioned this issue Aug 5, 2020

Add methods for serialization kaitai-io/kaitai_struct_python_runtime#48

Closed

generalmimon mentioned this issue Jul 18, 2024

Strz type support for UTF-16 and UTF-32 #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminator: support multi-byte termination bytes #158

terminator: support multi-byte termination bytes #158

davidhicks commented Apr 24, 2017 •

edited

Loading

rodmartin30 commented Mar 14, 2019

dgutson commented Mar 14, 2019

rodmartin30 commented Mar 19, 2019 •

edited

Loading

rodmartin30 commented Mar 25, 2019 •

edited

Loading

StefanRickli commented May 16, 2022

terminator: support multi-byte termination bytes #158

terminator: support multi-byte termination bytes #158

Comments

davidhicks commented Apr 24, 2017 • edited Loading

rodmartin30 commented Mar 14, 2019

dgutson commented Mar 14, 2019

rodmartin30 commented Mar 19, 2019 • edited Loading

rodmartin30 commented Mar 25, 2019 • edited Loading

StefanRickli commented May 16, 2022

davidhicks commented Apr 24, 2017 •

edited

Loading

rodmartin30 commented Mar 19, 2019 •

edited

Loading

rodmartin30 commented Mar 25, 2019 •

edited

Loading