-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add built-in process hex
and base64
#668
Comments
We're not adding more built-in Please consider contributing something like that, but for |
Ok, that is appropriate solution (but when it will be implemented will be good to have them available in webide). Is there any recommendations, how to create processor for all supported languages and how end users should get these algorithms in their applications? |
Yep, that's the plan — like all these "common" libraries will be automatically available in WebIDE together with all their dependencies.
Custom processors are documented in http://doc.kaitai.io/user_guide.html#custom-process. Per-language specifics are supposed to be documented in per-language notes on https://doc.kaitai.io, but in reality we're lagging behind on that documentation updates. Probably your best bet would be to copy the existing layout of kaitai_compress and start something like "kaitai_common" or "kaitai_misc" collection of algorithms.
Installation is obviously language-dependent and is outlined around Usage section in kaitai_compress. |
|
Makes sense, but in reality 100% of hex dumps I've seen so far were in ASCII. I can imagine a hex dump in UTF16, but we might just introduce special parameter for that in processing routine, or may be a special routine for these purposes. Even from performance-related side, it doesn't make much sense to "real" conversion of that data to strings first, and then do a string-to-integer conversions. |
Even from performance-related side, it doesn't make much sense to "real" conversion of that data to strings first, and then do a string-to-integer conversions.
From performance side decoding a sequence of bytes of known length into an ASCII/UTF-8 string should be an O(1) operation (it is just reinterpreting raw memory). If it is not the case, it is definitely a bug in the language.
but we might just introduce special parameter for that in processing routine
It is conceptually wrong. We have `str`ings and we have `encoding` for them. So we probably need not processors, but just support of externally-defined functions (and we definitely should have interfaces for that because we wanna validate the stuff in transpile time).
Or just external opaque types can be used for that. Interfaces here are not just needed, but mandatory because props are involved.
|
Not absolutely. By definition of this conversions they converts any byte sequence to 7-bit byte sequence (i.e. to ASCII encoded strings), that can be safely transferred through some old protocols. As strings they represented only for stupid humans (glory to robots!).. However, it is possible to solve this problem if we will represent that byte sequences as strings in ksy with defined |
Any reason not to support The performance of the bytes-to-string conversion is unlikely to be an issue for ASCII - any decent language has optimizations for that common case (I know at least Java and Python do). Conceptually I think hex/base64-encoded data should count as text strings. Hex is usually used to store arbitrary binary data in a format that can be read by humans (i. e. text), and nowadays base64 is almost exclusively used to convert arbitrary binary data to printable, ASCII-compatible text. (Yes, base64 was originally developed to transfer 8-bit data over channels that might only be 7-bit and could clobber the 8th bit, but if you're parsing that kind of data you probably need to strip the 8th bit beforehand anyway.) |
Because
It is a bytes-level operation. |
Good point, you still need to be able to use a regular byte Perhaps the hex/base64 decoding should be done using string methods instead (i. e. something like |
Making it a method will require it to be a part of every runtime. It'd be better to make it a separate auxilary package. So IMHO it is better to have it as just a function. |
"Function" is actually the worst possible choice for such stuff — it's imperative, you basically show how to do transformation one way and it's very untrivial to do it the other way around. Things like
|
I think, we can add another process phase. Right now there is situation, when Then, we can write: - id: mac
doc: Message Autentification Code (HEX)
size: 8
post-process: hex
expect: _.size == 4 # valid from #435 , but that name is better, IMHO This mean: read 8 bytes, then apply |
|
Yes. Actually, in case of
Do you think you can add these algorithms to your katai_compress or better implement them in a separate repository As you think, can that algorithms to be added to the https://github.com/kaitai-io/kaitai_compress (and maybe rename it to more generic |
you may have meant
. I personally pretty sure that it will be never merged that way. I mean IMHO we don't need
I have thought about renaming the |
This is two widely used encoding schemas, so it will be great, if kaitai will have built-in primitives for this.
The text was updated successfully, but these errors were encountered: