Add built-in process `hex` and `base64` #668

Mingun · 2020-01-10T08:49:26Z

This is two widely used encoding schemas, so it will be great, if kaitai will have built-in primitives for this.

GreyCat · 2020-01-10T10:17:10Z

We're not adding more built-in process anymore, given that we have pluggable modules now. We'll have series of libraries that will have these widely used procedures instead. See https://github.com/kaitai-io/kaitai_compress, for example, for popular compression algorithms.

Please consider contributing something like that, but for hex and base64?

Mingun · 2020-01-10T10:57:52Z

Ok, that is appropriate solution (but when it will be implemented will be good to have them available in webide).

Is there any recommendations, how to create processor for all supported languages and how end users should get these algorithms in their applications?

GreyCat · 2020-01-10T15:44:46Z

but when it will be implemented will be good to have them available in webide

Yep, that's the plan — like all these "common" libraries will be automatically available in WebIDE together with all their dependencies.

Is there any recommendations, how to create processor for all supported languages

Custom processors are documented in http://doc.kaitai.io/user_guide.html#custom-process. Per-language specifics are supposed to be documented in per-language notes on https://doc.kaitai.io, but in reality we're lagging behind on that documentation updates. Probably your best bet would be to copy the existing layout of kaitai_compress and start something like "kaitai_common" or "kaitai_misc" collection of algorithms.

and how end users should get these algorithms in their applications?

Installation is obviously language-dependent and is outlined around Usage section in kaitai_compress.

KOLANICH · 2020-01-10T21:10:08Z

process works on raw bytes. Hex and base64-encoded values are strings. I mean they may be utf-32be, or utf-16be, or utf32le... So, I guess process is a bit unsuitable here.

GreyCat · 2020-01-10T23:06:35Z

Makes sense, but in reality 100% of hex dumps I've seen so far were in ASCII. I can imagine a hex dump in UTF16, but we might just introduce special parameter for that in processing routine, or may be a special routine for these purposes.

Even from performance-related side, it doesn't make much sense to "real" conversion of that data to strings first, and then do a string-to-integer conversions.

KOLANICH · 2020-01-10T23:23:35Z

Even from performance-related side, it doesn't make much sense to "real" conversion of that data to strings first, and then do a string-to-integer conversions.

From performance side decoding a sequence of bytes of known length into an ASCII/UTF-8 string should be an O(1) operation (it is just reinterpreting raw memory). If it is not the case, it is definitely a bug in the language.

but we might just introduce special parameter for that in processing routine

It is conceptually wrong. We have `str`ings and we have `encoding` for them. So we probably need not processors, but just support of externally-defined functions (and we definitely should have interfaces for that because we wanna validate the stuff in transpile time). Or just external opaque types can be used for that. Interfaces here are not just needed, but mandatory because props are involved.

Mingun · 2020-01-11T07:15:21Z

Hex and base64-encoded values are strings.

Not absolutely. By definition of this conversions they converts any byte sequence to 7-bit byte sequence (i.e. to ASCII encoded strings), that can be safely transferred through some old protocols. As strings they represented only for stupid humans (glory to robots!)..

However, it is possible to solve this problem if we will represent that byte sequences as strings in ksy with defined hex or base64 encoding in the same way as we represent strings with ASCII or UTF-8 encoding (by the way, what encodings should be guaranteed to be supported by any kaitai-struct runtime?).

GreyCat · 2020-01-11T10:44:09Z

what encodings should be guaranteed to be supported

See #116 and #393.

dgelessus · 2020-01-11T15:13:17Z

process works on raw bytes.

Any reason not to support process for strings?

The performance of the bytes-to-string conversion is unlikely to be an issue for ASCII - any decent language has optimizations for that common case (I know at least Java and Python do).

Conceptually I think hex/base64-encoded data should count as text strings. Hex is usually used to store arbitrary binary data in a format that can be read by humans (i. e. text), and nowadays base64 is almost exclusively used to convert arbitrary binary data to printable, ASCII-compatible text.

(Yes, base64 was originally developed to transfer 8-bit data over channels that might only be 7-bit and could clobber the 8th bit, but if you're parsing that kind of data you probably need to strip the 8th bit beforehand anyway.)

KOLANICH · 2020-01-11T15:30:02Z

Any reason not to support process for strings?

Because process by definition works before any parsing of a field is done. The generated code

carves the field
processes it
does parsing on processing result

It is a bytes-level operation.

dgelessus · 2020-01-11T15:43:06Z

Good point, you still need to be able to use a regular byte process on string fields.

Perhaps the hex/base64 decoding should be done using string methods instead (i. e. something like string_field.decode_hex, which returns a byte array). There should be no need for an attribute ("process-str") here - a method call in a value instance would work just as well.

KOLANICH · 2020-01-11T15:55:32Z

Making it a method will require it to be a part of every runtime. It'd be better to make it a separate auxilary package. So IMHO it is better to have it as just a function.

GreyCat · 2020-01-11T23:36:21Z

"Function" is actually the worst possible choice for such stuff — it's imperative, you basically show how to do transformation one way and it's very untrivial to do it the other way around. Things like process make it much more declarative:

you clearly determine that there is transformation,
it's always in one predetermined position,
it's decoupled from the specification,
it's easily reversible — to implement serialization, you just need to provide not just a "decode" implementation, but also "encode" implementation

Mingun · 2020-01-12T09:56:48Z

I think, we can add another process phase. Right now there is situation, when process actually must be named pre-process. So it just needed to add post-process, that will transform parsed result to final form.

Then, we can write:

  - id: mac
    doc: Message Autentification Code (HEX)
    size: 8
    post-process: hex
    expect: _.size == 4 # valid from #435 , but that name is better, IMHO

This mean: read 8 bytes, then apply hex transformation (which, by convention, actually applied unhex transformation -- from HEX to bytes). Finally, assert, that size of result array is 4 bytes, as it should, just to clarify

KOLANICH · 2020-01-12T16:14:03Z

instances are already present.

Mingun · 2020-01-17T17:24:30Z

Yes. Actually, in case of hex and base64 even post-process is not required, because:

parser creates stream of size bytes
parser feed it into process function
parser does actual parsing of process result (not needed in that case or, the same, this is 1-to-1 transformation)

Do you think you can add these algorithms to your katai_compress or better implement them in a separate repository

As you think, can that algorithms to be added to the https://github.com/kaitai-io/kaitai_compress (and maybe rename it to more generic kaitai_algorithms), or better implement them in a separate repository?

KOLANICH · 2020-01-18T08:22:13Z

As you think, can that algorithms to be added

you may have meant

How do you think, if that algorithms can be added

.

I personally pretty sure that it will be never merged that way. I mean IMHO we don't need hex and base64 in decoders. We need it, but on other layers. These other layers are custom types. So feel free to create a repo of custom types with processors that cannot be implemented in KS only. Also look at my PRs into KSF, they contain code for some of such types.

and maybe rename it to more generic kaitai_algorithms

I have thought about renaming the kaitai_compress repo into kaitai_processors (and I have an own extended and refactored fork of that repo, not yet merged), but we strictly need interfaces #314 first because of serialization.

GreyCat added the enhancement label Jan 10, 2020

dgelessus mentioned this issue Aug 6, 2020

Add timestamp built-in type #793

Open

Mingun mentioned this issue Oct 26, 2022

Add ability to inject custom JS code for processors kaitai-io/kaitai_struct_webide#148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add built-in process `hex` and `base64` #668

Add built-in process `hex` and `base64` #668

Mingun commented Jan 10, 2020

GreyCat commented Jan 10, 2020

Mingun commented Jan 10, 2020

GreyCat commented Jan 10, 2020

KOLANICH commented Jan 10, 2020 •

edited

Loading

GreyCat commented Jan 10, 2020

KOLANICH commented Jan 10, 2020 via email

Mingun commented Jan 11, 2020

GreyCat commented Jan 11, 2020 •

edited

Loading

dgelessus commented Jan 11, 2020

KOLANICH commented Jan 11, 2020 •

edited

Loading

dgelessus commented Jan 11, 2020

KOLANICH commented Jan 11, 2020 •

edited

Loading

GreyCat commented Jan 11, 2020

Mingun commented Jan 12, 2020

KOLANICH commented Jan 12, 2020

Mingun commented Jan 17, 2020

KOLANICH commented Jan 18, 2020 •

edited

Loading

Add built-in process hex and base64 #668

Add built-in process hex and base64 #668

Comments

Mingun commented Jan 10, 2020

GreyCat commented Jan 10, 2020

Mingun commented Jan 10, 2020

GreyCat commented Jan 10, 2020

KOLANICH commented Jan 10, 2020 • edited Loading

GreyCat commented Jan 10, 2020

KOLANICH commented Jan 10, 2020 via email

Mingun commented Jan 11, 2020

GreyCat commented Jan 11, 2020 • edited Loading

dgelessus commented Jan 11, 2020

KOLANICH commented Jan 11, 2020 • edited Loading

dgelessus commented Jan 11, 2020

KOLANICH commented Jan 11, 2020 • edited Loading

GreyCat commented Jan 11, 2020

Mingun commented Jan 12, 2020

KOLANICH commented Jan 12, 2020

Mingun commented Jan 17, 2020

KOLANICH commented Jan 18, 2020 • edited Loading

Add built-in process `hex` and `base64` #668

Add built-in process `hex` and `base64` #668

KOLANICH commented Jan 10, 2020 •

edited

Loading

GreyCat commented Jan 11, 2020 •

edited

Loading

KOLANICH commented Jan 11, 2020 •

edited

Loading

KOLANICH commented Jan 11, 2020 •

edited

Loading

KOLANICH commented Jan 18, 2020 •

edited

Loading