-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol Buffers always flags the same field number #346
Comments
I can't imagine this pattern ever worked correctly. It uses global variables to display values that are calculated as the fie is read, but the values that are printed in the format functions will always be the last value the global takes. In the first example that value is 114 in your own example I don't really know by I venture that the last value that the global variable struct Key {
std::print("{:#x}",$);
type::uLEB128 keyDec;
field_number = u32(keyDec) >> 3;
std::print("{} {}",field_number, u32(keyDec) >> 3);
wire_type = u32(keyDec) & 7;
}[[format("format_key")]]; where i also print the address of |
I think this is what the pattern should be doing. Can you test it to make sure the results are what you expect? If it does feel free to open a PR or I can do it if you prefer not to. Let me know one way or the other. #pragma author WerWolv
#pragma description Google Protobuf wire encoding (.pb)
import std.core;
import std.io;
import std.mem;
import type.leb128;
struct ZigZag32 {
u32 value;
} [[sealed, format("format_zigzag32")]];
fn format_zigzag32(ZigZag32 zigzag) {
return s32((s32(zigzag.value) << 1) ^ (s32(zigzag.value) >> 31));
};
struct ZigZag64 {
u64 value;
} [[sealed, format("format_zigzag64")]];
fn format_zigzag64(ZigZag64 zigzag) {
return s64((s64(zigzag.value) << 1) ^ (s64(zigzag.value) >> 63));
};
enum WireType : u8 {
Varint = 0,
_64Bit = 1,
LengthDelimited = 2,
StartGroup = 3,
EndGroup = 4,
_32Bit = 5
};
struct Key {
type::uLEB128 keyDec;
u32 field_number = u32(keyDec) >> 3;
WireType wire_type = u32(keyDec) & 7;
}[[sealed, format("format_key")]];
fn format_key(Key keyObject) {
return std::format("{} with field number {}", keyObject.wire_type, keyObject.field_number);
};
union _64Bit {
u64 fixed64;
ZigZag64 sfixed64;
double dbl;
};
union _32Bit {
u32 fixed32;
ZigZag32 sfixed32;
float flt;
};
struct LengthDelimited {
type::uLEB128 length;
char data[length];
};
struct Entry {
Key key;
if (key.wire_type == WireType::Varint)
type::uLEB128 value;
else if (key.wire_type == WireType::_64Bit)
_64Bit value;
else if (key.wire_type == WireType::LengthDelimited)
LengthDelimited value;
else if (key.wire_type == WireType::_32Bit)
_32Bit value;
};
Entry entries[while(!std::mem::eof())] @ 0x00; |
That is significantly better yes! All the field numberings are coming through properly. It's still not parsing submessages but that's something I will raise in a separate issue and isn't a regression from the status quo.
I can confirm that these field numbers were the last in each file.
This was my first time reading the pattern files - that would have been this section here right? ImHex-Patterns/patterns/protobuf.hexpat Lines 35 to 37 in e779b88
Happy for you to raise - I don't think I will be able to adequately respond to any concerns that are raised in the PR review process. But I am encouraged to learn more about this pattern language for future issues! Thank-you for your speedy reply and patch. |
Not only the field numberings but also the wire types were messed up.
Makes sense and agrees with the observation that format functions are evaluated only at the end of the pattern file evaluation run.
Yes, exactly those were the lines where the global variables were defined. I just made them local variables of the struct and accessed them from there using the . operator. If you know a little c/c++ you will find the pattern language easy to get started with. There are some fundamental differences between pattern programming and standard language programming but after a while you start to think about the representations in a different way.
Ok, I'm on it and please consider my offer on improving this one. Im available in the ImHex discord channel anytime.
No problem and thank you for taking the time to help us make ImHex even better than it already is. |
In issue WerWolv#346 it is noted that the format functions return the same value repeatidly and erroneously. This is due to the use of global variables which result on only their last value being used in format functions due to their delayed evaluation. Fixed by using local variables instead. Also remove tabs from the file and an unused tags variable.
In issue #346 it is noted that the format functions return the same value repeatidly and erroneously. This is due to the use of global variables which result on only their last value being used in format functions due to their delayed evaluation. Fixed by using local variables instead. Also remove tabs from the file and an unused tags variable.
thanks all, especially @paxcut! Yeah, I was the one who implemented key decoding a while back, albeit without any first hand experience of protobuf... the main resource I relied on was this page though it's fairly obvious that I did make a few mistakes. Having submessage support would be interesting - I did think about implementing this, but I came to the conclusion that there would be a chicken-and-egg problem between the |
@applecuckoo I think the somewhat annoying part about Protobuf is that, without schema support, the parse is ambiguous - a Is there anyway to "attempt" a parse of each these variants in the Pattern Language, and trying the next if the parse fails? This would equivalent of using the @paxcut I'll take you up on that offer! I've joined the Discord - keen to find some time to figure this out. @applecuckoo @paxcut I'll flag 2 things that may be of help:
|
I have thought about this problem before too. try {
Entry nestedEntry;
} catch {
char data[length];
} The thing with that is, the content of these data can be absolutely everything. So it might happen that the content just happens to be accidentally in the right format for it to parse as a message, it would just be complete bogus. Maybe having it always parse as a byte array but additionally also as a nested member (if possible) would be an option? |
Pattern Language has a try {} catch{}that could deal with parsing failures.
Thats great!
I didn't check Katai but I did check 010 and they dont handle recursion either. |
Protobuf is vulnerable to this yes, and I have seen it happen on 2 seperate occasions. A few field numbers changed in a backwards incompatible manner and a message deserializer parsed gibberish into some of the fields. If it's any consolation this is fortunately quite rare - it typically only happens for shorter messages where length delimiters improbably line up. |
Let's load a protoscope test file: https://github.com/protocolbuffers/protoscope/blob/main/testdata/message.pb
We detect field
114
here:and here
and here
Let's try a file of my own - this time it thinks every field is
8
.The text was updated successfully, but these errors were encountered: