Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: java.lang.OutOfMemoryError #2528

Open
Doofus100500 opened this issue Dec 24, 2024 · 27 comments
Open

[🐛 Bug]: java.lang.OutOfMemoryError #2528

Doofus100500 opened this issue Dec 24, 2024 · 27 comments

Comments

@Doofus100500
Copy link
Contributor

Doofus100500 commented Dec 24, 2024

What happened?

Getting oom in eventbus container
image

Command used to start Selenium Grid with Docker (or Kubernetes)

helm

Relevant log output

{"class": "EventBusCommand","log-level": "INFO","log-message": "Started Selenium EventBus 4.26.0 (revision 69f9e5e): https:\u002f\u002f10.232.86.222:5557","log-name": "org.openqa.selenium.grid.commands.EventBusCommand","log-time-local": "2024-12-14T07:31:37.796Z","log-time-utc": "2024-12-14T07:31:37.796Z","method": "execute"}
Exception in thread "iothread-2" java.lang.OutOfMemoryError: Cannot reserve 8192 bytes of direct buffer memory (allocated: 501211210, limit: 501219328)
    at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
    at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121)
    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332)
    at zmq.io.coder.DecoderBase.<init>(DecoderBase.java:46)
    at zmq.io.coder.Decoder.<init>(Decoder.java:71)
    at zmq.io.coder.v2.V2Decoder.<init>(V2Decoder.java:18)
    at zmq.io.StreamEngine.handshake(StreamEngine.java:805)
    at zmq.io.StreamEngine.inEvent(StreamEngine.java:386)
    at zmq.io.IOObject.inEvent(IOObject.java:85)
    at zmq.poll.Poller.run(Poller.java:275)
    at java.base/java.lang.Thread.run(Thread.java:840)

Operating System

k8s

Docker Selenium version (image tag)

4.26.0-20241101

Selenium Grid chart version (chart version)

0.37.1

Copy link

@Doofus100500, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

It looks like the actual usage memory not reach the range of request and limit resources config.
In the latest change, I add default SE_JAVA_OPTS for all component (in the server configmap, which is referred by all components) the -Xmx and -Xms for JVM selenium server.

SE_JAVA_OPTS: "-XX:+UseG1GC -Xmx1024m -Xms256m -XX:MaxGCPauseMillis=1000 -Djdk.httpclient.keepalive.timeout=300 -Djdk.httpclient.maxstreams=10000"

Can you check it helps?

@VietND96
Copy link
Member

@joerg1985, do you have any comment on this?

@Doofus100500
Copy link
Contributor Author

-Xmx1024m -Xms256m

For all components, this is extremely low. In my opinion, it is necessary to make it possible to configure these parameters for each component individually. Under load, consumption increases significantly.

@VietND96
Copy link
Member

Via extraEnvironmentVariables in each component, I think you can override the global one

@Doofus100500
Copy link
Contributor Author

But this is not reflected in the chart for the eventBus and other distributed components

@VietND96
Copy link
Member

Oh really? Can you give example yaml values that you are settings?

@Doofus100500
Copy link
Contributor Author

For example, to address the issue with the event-bus mentioned in this issue, I added the following through k9s:

- name: SE_JAVA_OPTS  
  value: -Xmx2g

@VietND96
Copy link
Member

I just checked, in chart config, all distributed components are refer to this config for extra env vars components.extraEnvironmentVariables

@Doofus100500
Copy link
Contributor Author

That’s exactly what I’m saying. I want to set appropriate parameters for each component individually, rather than, for example, setting -Xmx16g for all of them.

@VietND96
Copy link
Member

Yes, I can understand the problem now, will add that config for each component, instead of common

@VietND96
Copy link
Member

Do you observe anything else that you think to fix in chart 0.38.3 also?

@Doofus100500
Copy link
Contributor Author

Unfortunately, I haven’t even looked into it yet. If I find anything, I’ll definitely come back in the future.

VietND96 added a commit that referenced this issue Dec 26, 2024
@joerg1985
Copy link
Member

@VietND96 i had a short look at the code of EventBusCommand and when looking at this (without debugging) i would expect a leak in the /status call. It adds a listener, but never removes it. Will put this on my todo list.

@joerg1985
Copy link
Member

The leaking listeners have been fixed in SeleniumHQ/selenium@269a7f6 but i am not sure this is the root cause here, as there are only a few bytes leaked for each call to /status so the grid must be up for several days to see this.

@Doofus100500
Copy link
Contributor Author

Doofus100500 commented Dec 28, 2024

Actually, in our case, we expect the grid (except for the pods with browsers) to always be operational. Could you please check for leaks and other components?
image
image
image
image

@joerg1985
Copy link
Member

@Doofus100500 i think the best would be to create a heap histogram with jmap and share them here.

@Doofus100500
Copy link
Contributor Author

Unfortunately, I will only be able to take care of this after the 9th.

@VietND96
Copy link
Member

VietND96 commented Jan 2, 2025

Via #2546, I added the way to get HeapDumpOnOutOfMemoryError, or get heap dump on demand when terminate/stop the container to directory /opt/selenium/logs. Need to use volume to mount that dir in container to persist the output files.

@joerg1985
Copy link
Member

@Doofus100500 please wait for the next release before testing, this might be the fix for your issue: SeleniumHQ/selenium#15011

@Doofus100500
Copy link
Contributor Author

Hi @VietND96 , have you considered using XX:MaxRAMPercentage and XX:MinRAMPercentage instead of Xmx and Xms? It seems like a good solution for general configuration in:

SE_JAVA_OPTS: "-XX:+UseG1GC -Xmx1024m -Xms256m -XX:MaxGCPauseMillis=1000 -Djdk.httpclient.keepalive.timeout=300 -Djdk.httpclient.maxstreams=10000"

@Doofus100500
Copy link
Contributor Author

I’m just unsure what percentage to set for MaxRAMPercentage, could you help me with that?

@VietND96
Copy link
Member

Hi, this one I am also not sure, will try to understand and let you know if I am able to find something.

@VietND96
Copy link
Member

I tried to read something related https://stackoverflow.com/questions/75025893/is-jvm-heap-memory-option-xxmaxrampercentage-only-valid-for-dockerized-applic

When you run the application in a dedicated container, together with a known set of programs or no other programs at all, you most probably want to specify the maximum amount of memory in relation to the container’s memory, so when you want to change the available memory, you only have to reconfigure the container instead of needing to adapt all programs’ start configurations

With docker-selenium, each component (Hub/Router/Distributor/SessionQueue/SessionMap/EventBus) runs in a dedicated container with a single program, so let it utilize the maximum amount with --XX:MaxRAMPercentage=100
With component Node, besides the program, the browser also consumes memory, so let it utilize a half --XX:MaxRAMPercentage=50

@joerg1985
Copy link
Member

@VietND96 the JVM should detect the container enviroment and adjust these values automatically, see https://bugs.openjdk.org/browse/JDK-8146115 for details.

@VietND96
Copy link
Member

@joerg1985, yes, but in a few graph screenshots above, OOM happened when actual memory consumed didn't reach the range between requests and limits allowed. What is your view?

@joerg1985
Copy link
Member

There are multiple limits to the different areas of the heap. So setting MaxRAMPercentage might not help here. When setting it to 100% the heap takes all the memory, but what about the other memory areas? They also need some memory.

I don't think we need to fine tune the memory management, we need to find the root cause for the leak.
But this might have been already fixed, so lets wait for @Doofus100500 feeback when using version 4.28.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants