Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local variables that depend on transformed values seem to accumulate memory from the start of the file. #113

Open
paxcut opened this issue Jun 30, 2024 · 1 comment

Comments

@paxcut
Copy link
Contributor

paxcut commented Jun 30, 2024

My use case was to calculate crc32 checksums of strings that had to be converted to lowercase before calculating the hash. First define an string, in this case null terminated (I choose a while array to avoid errors of data no placed) . Then we embed the string in a container for the sole purpose of using transform. After defining the transform function the possibly upper-cased strings are read from the file. We print one to verify it works. the crc32 checksum is computed to a local variable which is formatted and exported.

import std.string;
import std.core;
import std.hash;

u32 numFiles=1000;

struct Name {
    char name[while($[$]!=0)];
    padding[1];
}[[inline]];


struct Lower {
    Name name [[transform("lower")]];
}[[inline]];


fn lower(auto file) {
    Name result;
    result.name=std::string::to_lower(file.name);
    return result;
};

Name test @0x285679; //0x1126b
Lower testLower @0x285679; //0x1126b
std::print(" Uppercase: {} \n Lowercase: {}",test,testLower.name);


struct CRCLookup { 
    Lower fileName;
    u32 strCrc = std::hash::crc32(fileName.name, -1, 0x04C11DB7, -1, true, true) [[export,format("format_crc32")]];
};

fn format_crc32(auto val){
    return std::format("{:#x}",val);
};

CRCLookup names[numFiles]  @ 0x285679; //0x1126b

Input file: testMemory.zip

So far so good.I am running the same code shown here in an input file that has twp identical sets of strings one located close to he beginning of the file and the other at the end (addresses are on the code above). Even tough we skip reading the preamble data and both sets of pattern should be the same size, the resulting patterns differ greatly in size. In fact, depending on how much memory you have you should limit the number of strings being read or else your computer may hang.

Not only the patterns in the second case consume much more memory but also the size increments grow bigger and bigger as more strings are being read. Both observations suggest that the patterns contain data from the start of the file or some fixed position.

@paxcut
Copy link
Contributor Author

paxcut commented Jun 30, 2024

Update: added the input file which I forgot for some reason.

The test the bug.

  1. load the Input file, Unzip it first and dont worry, it is just libihmhex.dll.a with the symbol table duplicated at its end.
  2. Copy paste the pattern listed above.
  3. Run the pattern and note the memory usage
  4. change the @ values for the commented ones and run it again.
  5. Using 1000 strings I see a huge difference from 100 Mibs for the low address to 2.66 Gibs for the high address. Bigger number of strings give even bigger differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant