Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior difference with Netty writing out HTTP data #29307

Open
isaacrivriv opened this issue Aug 6, 2024 · 0 comments
Open

Behavior difference with Netty writing out HTTP data #29307

isaacrivriv opened this issue Aug 6, 2024 · 0 comments

Comments

@isaacrivriv
Copy link
Member

isaacrivriv commented Aug 6, 2024

While working on HTTP Netty, we realized there was a functional difference on how data is written as a response to an HTTP request.

Legacy (As Is)

When writing out a response to an HTTP request, the Liberty server starts by preparing the information here and calling formatHeaders which adds the prepared headers to the list of outgoing buffers here which later gets written all at once here. While writing with Wireshark on the wire it looks like this

image

Netty (To Be)

When writing out a response to an HTTP request, the Liberty server starts by preparing the data and writing the Netty HTTP headers and HTTP Content objects on the channel write queue and once all data is written, it flushes it down the wire so it sends through TCP. Here's how it looks on the wire

image

Difference

The main difference between the implementations is that if possible Legacy Channelfw will write the data all at once in the same buffer to the wire but Netty will flush the data individually as in the queue as seen in Wireshark.
legacy.zip
netty.zip

Problem

This is affecting many builds specifically many of the TCKs. While the TCK client reads the headers, the server starts writing data but while writing that data it fails with a channel closed exception because the client had already closed the connection because it already read from the headers everything it needed. While testing locally I disabled the IOException if writing on a closed channel and all the related issues with the TCKs were addressed. We are compliant with the HTTP spec since there is no specific way on how data should be written if on the same frame or separate but this is, however, a change on how data is written by design compared to Legacy and we're seeing the issues now.

It could also be because of the way TCP works where the socket could still accept writes without throwing an exception due to it being in a CLOSE_WAIT state. I've seen issues where writes are still being attempted even when the server is shutting down so it could be a possible explanation that the writes don't fail after the remote connection closes the connection because it is in this state. See resources below for similar issues with Java sockets.

Update

After more investigation, seems like the real reason behind the change of behavior is the default nature of Netty to not allow sockets to be half open after a peer closed the connection. In Legacy, we still attempt to continue writing/reading on the wire even if the peer closed the connection. As specific socket errors come up while doing IO operations, they are handled here.

To match this behavior in Netty, there is an option to allow this half open here which should match the same behavior of legacy with some additional changes to match that a read will close the connection if the socket was already closed.

Alternatives

Some alternatives to follow on this to fix this issue

  • Update our usage of Netty to follow the same behavior as the Legacy implementation. Could be costly and not 100% sure this is actually possible without maybe changing Netty source code behavior.
  • Discuss if we should be attempting to continue a write or throw an exception on a closed channel if a write caused it.
  • Update the TCK tests so that it closes the client connections once all the data is read. Costly since we'll have to go on each individual TCK test and figure out it's behavior and update them and try to get them merged in the upstream. As of now we have around 30 TCKs that are failing the same way.
  • Investigate if there is a way to accumulate data on Netty as bytes maybe flushing and write them all at once.
  • Accept this as a behavior change, document and move forward. While shutting down the TCKs as a FAT we can ignore these errors by adding them as accepted errors on server stop. We could also disable throwing IO exception on inbound connections on the TCKs as well.
  • Look into ChannelOption for Half Closure (see update above)

Chosen solution

After some discussions, we decided the best approach was to use the channel option ALLOW_HALF_CLOSURE to keep the channel open even if the peer closes it. This will match the same behavior as legacy and write to the wire the remaining items queued and eventually could hit a broken pipe exception if the data written is too much (same behavior as legacy). However, with these changes we hit some issues with H2 due to channels not properly closed because the peer closed the connection. In Legacy, H2 attempts to finish writing and reading on streams when the connection is closed by the remote. See #29508 for follow up

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Identified Tasks
Development

No branches or pull requests

1 participant