-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
op: add recv_multi #250
base: master
Are you sure you want to change the base?
op: add recv_multi #250
Conversation
Adds new types: bufring::{BufRing, Builder}, bufgroup:{BufX, Bgid, Bid} (BufX was chosen as the name of the buffer type of this pool simply to make it short and easy to grep for.) bufgroup may be better named provbuf, or provided_buffers, down the road, especially if we intent to support the other, older, much less efficient provided buffer mechanism. But the liburing team specifically recommends the buf_ring implementation for ease of use and performance reasons so we may not want to incur the overhead of supporting both forms. bufring could become a trait down the road, where different BufRing implementations can be provided. But to avoid the extra decision making that comes with designing traits, and to keep the code relatively easy to follow, these are all concrate types. Adds register and unregister functionality to BufRing and to the tokio_uring driver. Adds experimental recv, recv_provbuf methods to TcpStream. recv_provbuf is a purposefully ugly name. It will be replaced by something ... maybe a recv builder sometimme soon. Whatever is chosen would then be copied for Udp, Unix and for all the other operations that optionally take a provided buffer pool id. Adds 'net' unit tests. All are admittedly simple ping/pong tests where only the clients' received lengths are checked, not the actual data. Adds a tests/common area. Adds a test case that uses two std::threads, where each thread runs its own tokio_uring runtime and its own buf_ring provided buffer pool. The two-thread case is made long, with many clients, sending many large messages to the server and getting them back, in order to see gross performance impacts when changing things. It takes 3s on my machine. Before going into mainline, the numbers would be changed so it took no more than the other unit tests, so about 10ms. Many TODOs left to cleanup. Primarily Safety rationalizations. The buffer allocation is made as a single allocation, along with the ring. The buffer group id, bgid, also sometimes called the provided buffer group id, is manually selected through the builder api. There is no mechanism to pick one automatically. That could be added later but is not really necessary for this feature to be useful. This first implementation is without traits and without public interfaces that would let a user create a different kind of buf_ring or a different kind of `provided buffers` pool. There's a question to the liburing team outstanding about how to interpret an unexpected cqe result of res=0 and flags=4.
The first operation that supports streaming CQE results. Adds a Streamable trait, along the lines of the Completable trait. Comes with stream_next and stream_complete methods. Adds MultiCQEStream struct, along the lines of the MultiCQEFuture struct. Adds a submit_op_stream along the lines of submit_op. Adds poll_next_op along the lines of poll_op. The Lifecycle gets two additional methods: take_index, and data. take_index: returns the Lifecycle index and replaces it with usize::MAX to show the Lifecycle is no longer represented in the slab. This feature only used by the new poll_next_op. data: returns a reference to the Lifecycle's data, to be able to use it for the Streamable's stream_next calls which only require a reference. Ownership is still transferred in stream_complete. Adds the io/recv_multi operation. The tcp stream recv_multi is a function that bridges the private types with a public function. There is no cancel yet for the multishot command but the code can be written to break out of the loop. Also, when the connection is closed, the command should fail. It's not tested, but unregistering the buf_ring might cancel the command too - but maybe not. Unit tests: net unit tests and helper functions return io::Return add recv_multi cases add BufRingProps to net unit tests - to quiet empty buffer warnings which are intentional in some tests
This starts as one commit over So 245 should be reviewed first. |
|
||
pub(crate) type RecvMultiStream = Op<RecvMulti, MultiCQEStream>; | ||
|
||
pub(crate) struct RecvMulti { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about the generalized verions recvmsg_multi
too? I'm curious how the API works when receiving a single packet into multiple buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the question. You've seen Luca's excellent contribution to io-uring?
Each recvmsg result is going into a single buffer, pulled from the buf_ring pool by the kernel. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know where you are coming from with this. Can you refer to the liburing documentation to point out what you think I'm missing? The 251 PR doesn't seem relevant to how kernel io_uring implements multishot recvmsg at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know where you are coming from with this. Can you refer to the liburing documentation to point out what you think I'm missing? The 251 PR doesn't seem relevant to how kernel io_uring implements multishot recvmsg at all?
My apologies: #252 is relevant. 251 is completely irrelevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question was, have you considered how to cope with streaming operations where multiple buffers are provided in the original msghdr: that implies a collections of buffers
I have considered that collections of buffers are not part of the io_uring multishot receiving APIs so we don't need a streaming API for such an animal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered that collections of buffers are not part of the io_uring multishot receiving APIs so we don't need a streaming API for such an animal.
Thats a shame: that's a difference in API vs the non multishot API and the linux syscall, and precludes some of the memory locality use cases i spelt out in #252
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are differences when multishot comes into play - the one you are focusing on here isn't the only one and your wishing the multishot for recvmsg worked differently while this PR is getting multishot for recv support in the crate first. The io_uring team provides extra APIs to suite certain specialized use cases for performance reasons - they don't claim a broad and general API with consistent look and feel across the board.
And we don't need to either. We're trying to make the io_uring APIs accessible for folks who want to use the tokio runtime while providing a sound crate. We can't force every of their APIs into a single representation and there's no need to.
I supported your introduction of the ideas in 252, just earlier today, but I don't understand why you lean on it so heavily in a review of this PR for multishot recv. If something needs to be a shame, please spell it out in 252.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't wish to detract from this pr: I like this a great deal, especially the provided ring buffers required to make it work.
I'm looking at this API, as with most, with my use case in mind, which is UDP packet processing on a woefully underpowered aarch64 platform. Outperforming recv_mmsg is what brought me to this crate in the first place.
Personally I'm not invested in tokio api compliance at all - I simply don't care if we completely diverge. The API I always compare to are the base syscalls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if it outperforms what you are comparing it to, then it will do so without multishot. Or if you are invested in the io_uring interface, maybe have this discussion at liburing.
The first operation that supports streaming CQE results.
Adds a Streamable trait, along the lines of the Completable trait.
Comes with stream_next and stream_complete methods.
Adds MultiCQEStream struct, along the lines of the MultiCQEFuture
struct.
Adds a submit_op_stream along the lines of submit_op.
Adds poll_next_op along the lines of poll_op.
The Lifecycle gets two additional methods: take_index, and data.
Adds the io/recv_multi operation.
The tcp stream recv_multi is a function that
bridges the private types with a public function.
There is no cancel yet for the multishot command but the code can be
written to break out of the loop. Also, when the connection is closed,
the command should fail. It's not tested, but unregistering the buf_ring
might cancel the command too - but maybe not.
Unit tests:
net unit tests and helper functions return io::Return
add recv_multi cases
add BufRingProps to net unit tests - to quiet empty buffer warnings
which are intentional in some tests