Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elf2x: wrong ELF parsing leads to huge t_size, producing ~2GiB output #32

Open
EndoRPY opened this issue Apr 14, 2021 · 5 comments
Open

Comments

@EndoRPY
Copy link

EndoRPY commented Apr 14, 2021

Hi!

I followed your guide and have built mipsel-unknown-elf-gcc (experimental 11.0.1 20210411) successfully on MSYS2, including libc (it worked for me out-of-the-box). It compiles C programs, including the provided beginner/hello example, without warnings or errors.

When I tried to run the ELF through elf2x though, the tool started allocating a ~2GiB chunk of memory and then produced a binary with the same size. I've attached an example ELF (of the hello example) below. No matter what compile options (including those from the example makefiles) or program I've tried, the produced ELF causes the same issue. I also used both my own modified elf32elmip.x and the one from psn00bsdk to no avail.

I'm not familiar at all with the ELF format, but did some digging. I modified elf2x.c to add some printf debugging:

	// Load program headers and determine binary size and load address

	fseek( fp, head.prg_head_pos, SEEK_SET );
	for( i=0; i<head.prg_entry_count; i++ ) {
		printf("i = %d\n", i);
		
		fread( &prg_heads[i], 1, sizeof(PRG_HEADER), fp );

		if( prg_heads[i].flags == 4 ) {
			continue;
		}

		if( prg_heads[i].p_vaddr < exe_taddr ) {
			printf("p_vaddr < exe_taddr, p_vaddr = %d\n", prg_heads[i].p_vaddr);
			exe_taddr = prg_heads[i].p_vaddr;
		}

		if( prg_heads[i].p_vaddr > exe_haddr ) {
			printf("p_vaddr > exe_taddr, p_vaddr = %d\n", prg_heads[i].p_vaddr);
			exe_haddr = prg_heads[i].p_vaddr;
		}
		printf("exe_haddr: %d\n", exe_haddr);
	}
	exe_tsize = (exe_haddr-exe_taddr);
	printf("exe_tsize: %d\n", exe_tsize);
	exe_tsize += prg_heads[head.prg_entry_count-1].p_filesz;

The program's output (with main.Og.elf):

i = 0
i = 1
p_vaddr < exe_taddr, p_vaddr = 4194304
p_vaddr > exe_taddr, p_vaddr = 4194304
exe_haddr: 4194304
i = 2
p_vaddr > exe_taddr, p_vaddr = -2147418112
exe_haddr: -2147418112
i = 3
p_vaddr > exe_taddr, p_vaddr = -2147385100
exe_haddr: -2147385100
exe_tsize: 2143387892
pc:800115f0 t_addr:00400000 t_size:2143390364

Looks like either the p_vaddr of the third and fourth program headers are corrupted somehow, or elf2x is parsing the values from a place in the ELF it's not supposed to read from. mipsel-unknown-elf-objdump -x parses the ELF just fine, so I think it's the latter case
main.Og.zip

@EndoRPY
Copy link
Author

EndoRPY commented Apr 15, 2021

Update: I recompiled GCC and binutils again, this time as mipsel-none-elf instead of mipsel-unknown-elf and elf2x seems to work now with the recompiled toolchain.

However, elf2x still seems to parse an otherwise valid ELF incorrectly, so I'll leave this issue open for now. The problem that caused the bug in my earlier case doesn't seem to lie with the third or fourth, but second program header, where p_vaddr = 4194304. All example programs that compiled correctly have this value always as a large negative integer (as formatted by printf anyway)

@Lameguy64
Copy link
Owner

Well this is the first time I've seen anyone having an issue like this, all I could think of is the ELF file format probably changed in the experimental version of the GNU toolchain you're trying to use with PSn00bSDK. Does it still occur when using the SDK with older versions of the GNU toolchain?

@EndoRPY
Copy link
Author

EndoRPY commented Apr 21, 2021

The issue seems to be much weirder than I thought. The problem occurs with both the provided GCC version 10.2 (and ld version 2.35) as well as my experimental build.
Compilation with a single command, e.g.:
/c/psn00bsdk/gcc/bin/mipsel-none-elf-gcc -g -O2 -fno-builtin -fdata-sections -ffunction-sections -I/c/psn00bsdk/libpsn00b/include main.c -Wl,-g,-Ttext=0x80010000,-gc-sections,-T/c/psn00bsdk/gcc/mipsel-none-elf/lib/ldscripts/elf32elmip.x,-lpsxgpu,-lpsxgte,-lpsxspu,-lpsxetc,-lpsxapi,-lc,-L/c/psn00bsdk/libpsn00b

Should behave exactly the same as doing:
/c/psn00bsdk/gcc/bin/mipsel-none-elf-gcc -g -O2 -fno-builtin -fdata-sections -ffunction-sections -I/c/psn00bsdk/libpsn00b/include -c main.c -o build/main.o
... and then:
/c/psn00bsdk/gcc/bin/mipsel-none-elf-ld -g -Ttext=0x80010000 -gc-sections -T /c/psn00bsdk/gcc/mipsel-none-elf/lib/ldscripts/elf32elmip.x -L/c/psn00bsdk/libpsn00b build/main.o -lpsxgpu -lpsxgte -lpsxspu -lpsxetc -lpsxapi -lc -o hello.elf

But that's not true.
If you run mipsel-none-elf-objdump -h on both versions, you can notice that the single command version has .init, .fini, and .eh_frame sections, which the separately compiled and linked ELF doesn't have. I searched around and it seems no one on the web encountered this discrepancy before. Worse, I think it's not something the GCC devs would label a regression or bug in the first place.
I attempted to suppress GCC from generating these sections, but with no luck. Neither -fPIC -mabicalls nor -fno-asynchronous-unwind-tables -fno-unwind-tables removed them (the former threw a bunch of warnings but compiled, the latter errored as it needed -shared). At first I thought these sections were the culprit, but after stripping them from the ELF manually and trying elf2x, the same issue comes up. Odd. Forcefully stripping them—especially .init—probably breaks the binary anyway. That's why I think elf2x parses the ELF format wrong in some way.

People who use CMake or [the provided] make[file] won't run into this, but people that try to quickly compile a simple program on the command line will be stuck. I haven't tested this on any older versions of GCC yet, like 7

@Lameguy64
Copy link
Owner

I haven't actually gotten building an ELF from a single gcc/g++ command line to work right so I never bothered getting that to work on PSn00bSDK and is therefore, untested. The closest I ever got was it would produce an ELF file but for whatever reason the call to main() from _start() always points to address 0. Perhaps that might be why elf2x looked like it was parsing the ELF file incorrectly as one of the objects were addressed to 0x0 rather than 0x80000000.

@EndoRPY
Copy link
Author

EndoRPY commented Apr 22, 2021

Do you think this could be a linker script issue? (Side note: you can change the VMA/LMA of the .text section in the ld script instead of having to pass the address as a compile flag)
I only tinkered around with the script for a short time, but moving the .text section to the top of SECTIONS but after the PROVIDE seems to fix the compilation, and elf2x works fine and doesn't throw any error. However, running the EXE in No$PSX yields a black screen and the program is stuck in an infinite loop on a single instruction (not just in the case of hello.c, but also system/tty/main.c).
(Another unrelated side note: UPX supports compressing PS-EXE, but you need to --force it because it assumes the padding of the header is 0x00. Elf2x EXEs work fine with No$PSX but not Mednafen for me, but compressing them with UPX fixes this. But different emulators are a whole 'nother can of worms)

If this issue won't find a solution, at least this will serve as a documentation that compilation in one line doesn't work for now, so people won't be surprised. Thank you a lot for going through this hell to make this library. The linker scripts and ELF's intrinsics gave me eye and brain cancer personally, and I can't make sense of why this particular case brings up so many issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants