You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on HTTP Netty, we realized there was a functional difference on how data is written as a response to an HTTP request.
Legacy (As Is)
When writing out a response to an HTTP request, the Liberty server starts by preparing the information here and calling formatHeaders which adds the prepared headers to the list of outgoing buffers here which later gets written all at once here. While writing with Wireshark on the wire it looks like this
Netty (To Be)
When writing out a response to an HTTP request, the Liberty server starts by preparing the data and writing the Netty HTTP headers and HTTP Content objects on the channel write queue and once all data is written, it flushes it down the wire so it sends through TCP. Here's how it looks on the wire
Difference
The main difference between the implementations is that if possible Legacy Channelfw will write the data all at once in the same buffer to the wire but Netty will flush the data individually as in the queue as seen in Wireshark. legacy.zip netty.zip
Problem
This is affecting many builds specifically many of the TCKs. While the TCK client reads the headers, the server starts writing data but while writing that data it fails with a channel closed exception because the client had already closed the connection because it already read from the headers everything it needed. While testing locally I disabled the IOException if writing on a closed channel and all the related issues with the TCKs were addressed. We are compliant with the HTTP spec since there is no specific way on how data should be written if on the same frame or separate but this is, however, a change on how data is written by design compared to Legacy and we're seeing the issues now.
It could also be because of the way TCP works where the socket could still accept writes without throwing an exception due to it being in a CLOSE_WAIT state. I've seen issues where writes are still being attempted even when the server is shutting down so it could be a possible explanation that the writes don't fail after the remote connection closes the connection because it is in this state. See resources below for similar issues with Java sockets.
Update
After more investigation, seems like the real reason behind the change of behavior is the default nature of Netty to not allow sockets to be half open after a peer closed the connection. In Legacy, we still attempt to continue writing/reading on the wire even if the peer closed the connection. As specific socket errors come up while doing IO operations, they are handled here.
To match this behavior in Netty, there is an option to allow this half open here which should match the same behavior of legacy with some additional changes to match that a read will close the connection if the socket was already closed.
Alternatives
Some alternatives to follow on this to fix this issue
Update our usage of Netty to follow the same behavior as the Legacy implementation. Could be costly and not 100% sure this is actually possible without maybe changing Netty source code behavior.
Discuss if we should be attempting to continue a write or throw an exception on a closed channel if a write caused it.
Update the TCK tests so that it closes the client connections once all the data is read. Costly since we'll have to go on each individual TCK test and figure out it's behavior and update them and try to get them merged in the upstream. As of now we have around 30 TCKs that are failing the same way.
Investigate if there is a way to accumulate data on Netty as bytes maybe flushing and write them all at once.
Accept this as a behavior change, document and move forward. While shutting down the TCKs as a FAT we can ignore these errors by adding them as accepted errors on server stop. We could also disable throwing IO exception on inbound connections on the TCKs as well.
Look into ChannelOption for Half Closure (see update above)
Chosen solution
After some discussions, we decided the best approach was to use the channel option ALLOW_HALF_CLOSURE to keep the channel open even if the peer closes it. This will match the same behavior as legacy and write to the wire the remaining items queued and eventually could hit a broken pipe exception if the data written is too much (same behavior as legacy). However, with these changes we hit some issues with H2 due to channels not properly closed because the peer closed the connection. In Legacy, H2 attempts to finish writing and reading on streams when the connection is closed by the remote. See #29508 for follow up
While working on HTTP Netty, we realized there was a functional difference on how data is written as a response to an HTTP request.
Legacy (As Is)
When writing out a response to an HTTP request, the Liberty server starts by preparing the information here and calling formatHeaders which adds the prepared headers to the list of outgoing buffers here which later gets written all at once here. While writing with Wireshark on the wire it looks like this
Netty (To Be)
When writing out a response to an HTTP request, the Liberty server starts by preparing the data and writing the Netty HTTP headers and HTTP Content objects on the channel write queue and once all data is written, it flushes it down the wire so it sends through TCP. Here's how it looks on the wire
Difference
The main difference between the implementations is that if possible Legacy Channelfw will write the data all at once in the same buffer to the wire but Netty will flush the data individually as in the queue as seen in Wireshark.
legacy.zip
netty.zip
Problem
This is affecting many builds specifically many of the TCKs. While the TCK client reads the headers, the server starts writing data but while writing that data it fails with a channel closed exception because the client had already closed the connection because it already read from the headers everything it needed. While testing locally I disabled the IOException if writing on a closed channel and all the related issues with the TCKs were addressed. We are compliant with the HTTP spec since there is no specific way on how data should be written if on the same frame or separate but this is, however, a change on how data is written by design compared to Legacy and we're seeing the issues now.
It could also be because of the way TCP works where the socket could still accept writes without throwing an exception due to it being in a
CLOSE_WAIT
state. I've seen issues where writes are still being attempted even when the server is shutting down so it could be a possible explanation that the writes don't fail after the remote connection closes the connection because it is in this state. See resources below for similar issues with Java sockets.Update
After more investigation, seems like the real reason behind the change of behavior is the default nature of Netty to not allow sockets to be half open after a peer closed the connection. In Legacy, we still attempt to continue writing/reading on the wire even if the peer closed the connection. As specific socket errors come up while doing IO operations, they are handled here.
To match this behavior in Netty, there is an option to allow this half open here which should match the same behavior of legacy with some additional changes to match that a read will close the connection if the socket was already closed.
Alternatives
Some alternatives to follow on this to fix this issue
Chosen solution
After some discussions, we decided the best approach was to use the channel option
ALLOW_HALF_CLOSURE
to keep the channel open even if the peer closes it. This will match the same behavior as legacy and write to the wire the remaining items queued and eventually could hit a broken pipe exception if the data written is too much (same behavior as legacy). However, with these changes we hit some issues with H2 due to channels not properly closed because the peer closed the connection. In Legacy, H2 attempts to finish writing and reading on streams when the connection is closed by the remote. See #29508 for follow upResources
The text was updated successfully, but these errors were encountered: