Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 98 crash with RTL8029 #452

Open
Vort opened this issue Jan 10, 2025 · 24 comments
Open

Windows 98 crash with RTL8029 #452

Vort opened this issue Jan 10, 2025 · 24 comments

Comments

@Vort
Copy link
Contributor

Vort commented Jan 10, 2025

When RTL8029 (ne2k) card is installed in PCI slot # 2, Windows 98 SE becomes unstable and reboots (crashes) during or after load. When card is installed in slot # 3, no such problem occurs.

Difference appears because different slots have different default IRQ value assigned to them.
If I understand correctly, such behaviour is controlled by this map.
Which means slot # 2 uses IRQ 11 and slot # 3 uses IRQ # 9.

As for why exactly crash happens, I have no idea yet, but I guess real hardware have no problem with using IRQ 11 and Bochs have some bug with IRQ emulation, which should be fixed (I hope no one will suggest to just use different IRQ).

Here is config file, which I use with Windows 98, however I doubt it matters: bochsrc_cirrus.zip

Version: d1e7ee2.

@vruppert
Copy link
Contributor

Since I have an image with a Win98SE installation here, but with E1000 network card, I decided to do a clean install on a new image. I can confirm that the crashes / reboot appear after installation. The original Win98 works fine with this setup. Since the Bochs BIOS doesn't have real plug&play support, I remember that the PCI bus cannot be detected and all of it's devices are not found. Since ACPI is on in this config and Win98SE installs drivers for it, I thought that everything's okay this way, but it's not. So I disabled ACPI with the option "noacpi", booted the system, installed the PCI bus driver manually, and rebooted. Finally I turned on ACPI again and Win98SE boots without issues. Then I tried to do a clean install using @fysnet 's new BIOS, since it has plug&play support. Unfortunately I got a blue screen a the device detection stage and I couldn't finish installation.

@Vort
Copy link
Contributor Author

Vort commented Jan 11, 2025

I decided to do a clean install on a new image

With Win98 SE and RTL8029?

The original Win98 works fine with this setup

Do you mean Win98 without SE update, but with RTL8029 in slot 2?

I remember that the PCI bus cannot be detected and all of it's devices are not found.

I believe this problem was fixed with 0f738ed.
In my tests, installations, which were made after this fix, detect PCI bus fine.
Regarding old installations, I consider them broken: they use 100% of CPU even without actual load, no matter if I install PCI bus manually or not. So I decided to just reinstall my Win98SE after this fix was made.

@vruppert
Copy link
Contributor

Yes, I have installed Win98SE using your bochsrc device settings. The original Win98 (without SE) installation is old and I remember that I had to add the PCI bus manually- The RTL8029 is attached to slot 2, but no issues. I guess the mentioned fix is not enough in all cases. The PnP BIOS needs to return device nodes, especially the one for the PCI BIOS. After installing the PCI bus manually, the driver for the PCI IRQ routing is installed automatically. I guess it was missing before causing PCI cards not working in all slots. Please try to boot with the ne2k disabled in config and look at the device manager whether or not the PCI bus is present in the system devices section.

@Vort
Copy link
Contributor Author

Vort commented Jan 11, 2025

Please try to boot with the ne2k disabled in config and look at the device manager whether or not the PCI bus is present in the system devices section.

Here are results from device manager and system info (msinfo32):

ne2k=true | slot1=cirrus, slot2=ne2k ("crash" config)

image

image

ne2k=false | slot1=cirrus, slot2=none

image

image

ne2k=false | slot1=none, slot2=none

image

image

@vruppert
Copy link
Contributor

I cannot read the device names from your screenshots, but that doesn't matter here. For case 1 I didn't have enough time to take a screenshot, since it reboots before I had a chance to do so. I finally performed another installation without disabling ACPI afterwards. I had the look at the Bochs BIOS acpi-dsdt.dsl and the related SeaBIOS file and found differences in the PCI section. So I tried to set romimage to SeaBIOS image from Bochs bios folder. It booted without issues and installed the missing drivers for the PCI IRQ routing. I'll have a look how to fix this ACPI file soon.

@Vort
Copy link
Contributor Author

Vort commented Jan 11, 2025

Oh, I forgot about SeaBIOS, because I initially made tests with i440bx and SeaBIOS showed black screen in that case.
Here are screenshots after additional drivers were installed with SeaBIOS:

ne2k=true | slot1=cirrus, slot2=ne2k + SeaBIOS

image

image

There is minor problem here: CL-GD5446 complains about I/O ranges, but it is probably unrelated.
What is more important, crashes/reboots are not happening anymore.
Also, PCI IRQ related lines in system info are different now. I don't know however, which ones are more correct and why.

@fysnet
Copy link
Collaborator

fysnet commented Jan 11, 2025

Just to clarify, both of you are still using a chipset of i440bx not i440fx, correct?

My BIOS needs a little work to truly present the correct correspondence between the acpi-dsdt.dsl and the PnP stuff. I will see what I can do.

@Vort
Copy link
Contributor Author

Vort commented Jan 11, 2025

Just to clarify, both of you are still using a chipset of i440bx not i440fx, correct?

No.
This problem reproduces fine with i440fx, so to not complicate things, it is more convenient to fix it with i440fx.
Time will tell if such fix will affect i440bx or not.
In my opinion, chances are high that it will.

@fysnet
Copy link
Collaborator

fysnet commented Jan 14, 2025

Please try my updated BIOS (https://github.com/fysnet/i440fx/blob/main/i440fx.bin)

Except for one issue (explained below), it boots your bochsrc_cirrus.zip file with the ne2k all the way to the desktop and seems to be stable.

The only issue I see is that just after the Windows Installation Verification entry (the Product Key five by five), it starts to install the PnP devices and a BSOD happens. If you simply hit reset and let it boot again, it then continues to install and goes to the desktop just fine.

Therefore, I have something wrong with my PnP stuff, but I think this new update fixes the IRQ issue here and in #448.

There is another issue with my PnP that I know of, which is currently skipped in the code, and there is an issue with how I initialize the serial port(s). Windows doesn't like the state of the serial port after I initialize it, therefore it is currently commented out.

I even did a clean install with PCI = i440bx (emphasis on the BX) and it installed just fine (minus the PnP issue stated above).
The i440bx boot showed a "PnP failsafe" in the hardware settings. The i440fx might have, I just didn't look.

Again, I think this update fixed the IRQ stuff here and in #448, I just need to look at my PnP stuff some more.

@Vort
Copy link
Contributor Author

Vort commented Jan 14, 2025

Please try my updated BIOS

I tried, but in most of the cases using your BIOS produces following result:

Invalid system disk

image

Disk images which do not work, usually large, so I don't know how to help you reproduce this problem.
Either you guess what's wrong with it, or I can try uploading these files somewhere.

upd. Here is geometry info, maybe it will help:

00000000000i[HD    ] HD on ata0-0: 'Windows98.vmdk', 'vmware4' mode
00000000000i[HD    ] ata0-0: image geometry: CHS=2031/16/63 (sector size=512)

@fysnet
Copy link
Collaborator

fysnet commented Jan 15, 2025

I tried, but in most of the cases using your BIOS produces following result:
Invalid system disk

Disk images which do not work, usually large, so I don't know how to help you reproduce this problem. Either you guess what's wrong with it, or I can try uploading these files somewhere.

upd. Here is geometry info, maybe it will help:

00000000000i[HD    ] HD on ata0-0: 'Windows98.vmdk', 'vmware4' mode
00000000000i[HD    ] ata0-0: image geometry: CHS=2031/16/63 (sector size=512)

Please give me a little more detail. Does it fail at boot? Are you installing Win98SE and it fails at a certain point.
Does this image have Win98SE already installed and fails trying to boot the MBR, the VBR, or shortly after boot?

I have a flat image with that geometry and Windows98SE installed exactly as I describe before, without the fail you are describing.

I ran Win98SE's Scandisk with thorough set and it read/wrote all sectors without error.

I also created a vmdk file with

00000000000i[HD    ] HD on ata0-0: 'large.vmdk', 'vmware4' mode
00000000000i[HD    ] ata0-0: image geometry: CHS=2080/16/63 (sector size=512)

The whole installation process was just fine (minus the issues stated previously).

Please check again and if it is still is in error, yes please find a way to upload your .vmdk file and the bochsrc.txt file you are using.

Thank you

@Vort
Copy link
Contributor Author

Vort commented Jan 15, 2025

Does it fail at boot?

Yes.

Does this image have Win98SE already installed

Yes.

Please check again and if it is still is in error, yes please find a way to upload your .vmdk file and the bochsrc.txt file you are using.

Here is image file (packed with 7-Zip): https://wdfiles.ru/NM9c
Here is config file: Windows98_config.zip

Please give me a little more detail.

I tried to use this image with VirtualBox and VMware and they boot fine from it.

@fysnet
Copy link
Collaborator

fysnet commented Jan 16, 2025

Please try it now. ( https://github.com/fysnet/i440fx/blob/main/i440fx.bin )

It boot to a message that there is something wrong with the CONFIGMG file. However, I am guessing that this install didn't have a working PCI Routing mechanism at the time of installation. Please try it again with the new BIOS.

@Vort
Copy link
Contributor Author

Vort commented Jan 16, 2025

However, I am guessing that this install didn't have a working PCI Routing mechanism at the time of installation.

I suspect this fail is related to ACPI.
If I boot in Safe Mode and delete "Интерфейс автоматического управления конфигурацией и питанием (ACPI) BIOS" device, Windows loads without error after reboot.
If I re-add this device, CONFIGMG error reappears.

I will try fresh install next.

@Vort
Copy link
Contributor Author

Vort commented Jan 16, 2025

Fresh install, with the same config, is broken as well

Image

It fails during Plug and Play setup step

Image

upd: I see this problem is explained above. Will try to continue installation next.

@Vort
Copy link
Contributor Author

Vort commented Jan 16, 2025

The i440bx boot showed a "PnP failsafe" in the hardware settings. The i440fx might have, I just didn't look.

I guess this mode is responsible for 100% CPU usage, which was happening with Bochs BIOS before 0f738ed as well.

If you simply hit reset and let it boot again, it then continues to install and goes to the desktop just fine.

It produces IRQ assignments different from what I was observing previously, but I confirm that no crashes and reboots are happening:

IRQ list

Image

When I tried to convert config from i440fx to i440bx, BIOS related device refused to work properly:

Device list for i440bx

Image

So there are definitely more things to fix there.


Going back to original problem, what do you think is broken in BIOS-bochs-latest, which produces reboots and crashes?
Your BIOS have lots of changes, but how many of them need to be ported to fix Bochs BIOS?

@fysnet
Copy link
Collaborator

fysnet commented Jan 16, 2025

When I tried to convert config from i440fx to i440bx, BIOS related device refused to work properly:
Device list for i440bx

Did you change i440fx to i440bx without doing a clean re-install? If so, again, my opinion is that this will produce undefined errors every time. The OS installs drivers for i440fx and then if the chipset is changed to i440bx without re-installing the drivers (a clean re-install), you will receive errors.

Going back to original problem, what do you think is broken in BIOS-bochs-latest, which produces reboots and crashes? Your BIOS have lots of changes, but how many of them need to be ported to fix Bochs BIOS?

I have not been keeping track. Sorry.

Fresh install, with the same config, is broken as well
It fails during Plug and Play setup step

These two issues are the exact issues I mention and described how to work around them. I am still trying to fix the PnP stuff as to remedy these two issues.

Since this issue is now fixed, along with #448, at least with the i440fx.bin BIOS, can these two issues be closed?

Thanks.

@Vort
Copy link
Contributor Author

Vort commented Jan 16, 2025

Did you change i440fx to i440bx without doing a clean re-install?

Yes.

If so, again, my opinion is that this will produce undefined errors every time.

The question is, as always, - how real hardware behaves? Usually, it have undefined behaviour only if broken.
If tests with real hardware are not available, then common sense and tests with other emulators can be used.
In this case, I saw such swap almost working with present Bochs BIOS, so I assume it will work even better when bug is fixed.

Since this issue is now fixed, along with #448, at least with the i440fx.bin BIOS, can these two issues be closed?

Of course, not.
This issue is in bochs-emu/Bochs repository, not in fysnet/i440fx.

I have not been keeping track. Sorry.

Ok, now we have two kinda-working BIOSes to compare with.
Let's see if it will help @vruppert (or me) to come up with solution.

@vruppert
Copy link
Contributor

I have no solution for this issue yet. The main problem is that I'm not familiar with the code used in the DSL file. Using the files from SeaBIOS requires some more changes in the PCI BIOS init code and it may break existing installations. I found out that the LNKA to LNKD entries of the Bochs version list some IRQ number already used by ISA. The SeaBIOS version only lists IRQ 5, 10 and 11 and the PCI init code assigns 10, 10, 11 and 11 (Bochs: 11, 9, 11, 9). Until now I haven't found out what to change to fix this issue without breaking anything.

@Vort
Copy link
Contributor Author

Vort commented Jan 17, 2025

Small update.
This problem may be closely related to #196.
Reboots occur when bx_acpi_ctrl_c::set_irq_level is called with level = true:

Bochs/bochs/iodev/acpi.cc

Lines 241 to 244 in fd86c3a

void bx_acpi_ctrl_c::set_irq_level(bool level)
{
DEV_pci_set_irq(BX_ACPI_THIS s.devfunc, BX_ACPI_THIS pci_conf[0x3d], level);
}

For example, when timer overflows.

As for why problem is not visible with other BIOSes I have such assumptions:

  1. i440fx.bin has broken ACPI, so timer is just not ticking.
  2. bios.bin-1.13.0 gives IRQ 10 to Intel 82371EB Power Management, which means there is no conflict with RTL8029.

The question is why such conflict is possible at all.

Side note: when diagnosing this problem, I tried to disable PM device. It showed as disabled in Device Manager, but still produced interrupts and reboots as consequence. Do not know if this is how it should work with real hardware.

@Vort
Copy link
Contributor Author

Vort commented Jan 21, 2025

Looks like Windows has no actual drivers for "Intel 82371EB Power Management" device, but it gets IRQ 11 assigned nevertheless.
However, Windows has drivers for "SCI IRQ used by ACPI bus" device, which probably gets its IRQ 9 because of pm_sci_int assignment here:

Bochs/bochs/bios/rombios32.c

Lines 1053 to 1055 in f2b0ad0

// acpi sci is hardwired to 9
pci_config_writeb(d, PCI_INTERRUPT_LINE, 9);
pm_sci_int = pci_config_readb(d, PCI_INTERRUPT_LINE);

If I understand correctly, it is the same device in fact and should get the same IRQ number.
When I hack Bochs code to assign IRQ 9 to "Intel 82371EB Power Management", crashes/reboots disappear:

diff --git a/bochs/iodev/acpi.cc b/bochs/iodev/acpi.cc
index 2fa0168d8..bbef4c19e 100644
--- a/bochs/iodev/acpi.cc
+++ b/bochs/iodev/acpi.cc
@@ -143,7 +143,7 @@ void bx_acpi_ctrl_c::init(void)
   BX_ACPI_THIS s.sm_base = 0x0;
 
   // initialize readonly registers
-  init_pci_conf(0x8086, 0x7113, 0x03, 0x068000, 0x00, BX_PCI_INTA);
+  init_pci_conf(0x8086, 0x7113, 0x03, 0x068000, 0x00, BX_PCI_INTB);
 }
 
 void bx_acpi_ctrl_c::reset(unsigned type) 

SeaBIOS uses more complicated method to achieve the same result:
https://github.com/coreboot/seabios/blob/1b598a1d79dcb9261295fd5b1aa2b65d1348c0c1/src/fw/acpi-dsdt.dsl#L188-L189

@vruppert, do you know why "power mgmt device can only use irq 9"?
This is what I was able to find: In non-APIC systems (which is the default), the SCI IRQ is routed to one of the 8259 interrupts (IRQ 9, 10, or 11)

Probably documentation for 82371EB will have answer, but I haven't gotten there yet.

@Vort
Copy link
Contributor Author

Vort commented Jan 21, 2025

Probably documentation for 82371EB will have answer, but I haven't gotten there yet.

From documentation it looks like SCI hardwiring is not required:

Image

So alternative solution is:

diff --git a/bochs/bios/rombios32.c b/bochs/bios/rombios32.c
index 75e23fb1a..763ed3594 100644
--- a/bochs/bios/rombios32.c
+++ b/bochs/bios/rombios32.c
@@ -1050,8 +1050,6 @@ static void pci_bios_init_device(PCIDevice *d)
         /* PIIX4 Power Management device (for ACPI) */
         pm_io_base = PM_IO_BASE;
         smb_io_base = SMB_IO_BASE;
-        // acpi sci is hardwired to 9
-        pci_config_writeb(d, PCI_INTERRUPT_LINE, 9);
         pm_sci_int = pci_config_readb(d, PCI_INTERRUPT_LINE);
         piix4_pm_enable(d);
         acpi_enabled = 1; 

However, there must be a reason why someone made such hardwiring.
Which means such fix will probably break something else.

@vruppert
Copy link
Contributor

I just had a look at the history of the rombios32 code. It has been added in 2006 when Qemu used the Bochs BIOS and the Qemu people contributed most of it's code. In an older Qemu version I noticed that the ACPI SCI was directly connected to the PIC's IRQ 9 instead of the PCI IRQ routing. Unfortunately the latest Qemu code is so complicated and I don't know if it's still the case. The Bochs ACPI SCI is connected to the PCI IRQ router, so the BIOS code above should be modified this way. Please test this modification with several guest systems if possible. I did a quick test with Win98SE here and it worked fine.

@Vort
Copy link
Contributor Author

Vort commented Jan 21, 2025

Please test this modification with several guest systems if possible.

Same Windows 98 SE, but with chipset=i440bx, easily moved power management device from IRQ 11 to IRQ 9, leaving SCI at IRQ 11.
I don't like this situation despite the fact that I wasn't able to get crash with such configuration (RTL8029 was moved to IRQ 10).

I wasn't able to find if it is possible to update SCI during such event.
If not, then fixing it to IRQ 9 may be the only option.

Worth to check how this problem is solved with real BIOSes.
(And, probably, improve compatibility with them at the same time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants