-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add timeout handler and max rtt option #49
base: main
Are you sure you want to change the base?
Conversation
I noticed have some memory problem. I think this pr can fix it. the user should set timeout to resolve memory problem. |
https://github.com/prometheus-community/pro-bing/blob/2dcd76c30c10636979f44f178ed9538e5a6e7043/ping.go#LL714C17-L714C17 |
you can set MaxRtt and OnTimeout func Ontime func while be call when a request was not answered within a specified time Signed-off-by: zhangenyao <[email protected]>
I wrote this comment in a related issue but I think I should repeat it here. I think I get the issue here, but I'm wondering if the solution is to go farther, and not just have the 'native' pinger try to replicate /usr/bin/ping. If I interpret the man page correctly, when you request a certain number of packets, it sends -c, -w, and -W. If you don't specify a long enough -w (which the plugin calls deadline) then you might have gotten all the responses back, you just didn't wait long enough. The other thing that is confusing is -W (timeout) is only used if there are no responses. if there are responses, then that is ignored and it actually waits two times some RTT that it calculates - presumably from earlier pings, maybe from only the last one. I think the fundamental problem is how dynamic the 'ping' command line tool is, then 'native' tries to mimic that and brings in the same problems. I repeatedly get way worse results from this plugin than I get running ping from the command line on the same host. I think the reason for this is ping doesn't behave very well with -c, -w, -W when you have high variability in response time. I'm not sure how I would fix this. I think a total rewrite of 'native' could fix it, but it would want to take on a different strategy, such as launching -c n pings -W t seconds apart, then wait until either all of them come back or -w has passed, and count. I think I'd make a few changes. One idea is to space pings out at a fixed interval, regardless of RTT. The second would be to put serial numbers of some sort in every ping, then add a few fields to detect if you got any responses out of order and if you got any duplicates. Then, end the entire thing on either deadline or you get count responses. The down side to this is you might have multiple pings in flight at the same time. I think that's rarely a problem, but if we wanted to not break backward compatibility you could signal you want this new behavior by specifying some kind of backward compatibility flag, or replicate 'native' and make a new choice with a new name. Thoughts ? |
@@ -183,6 +184,8 @@ type Pinger struct { | |||
// OnRecvError is called when an error occurs while Pinger attempts to receive a packet | |||
OnRecvError func(error) | |||
|
|||
// OnTimeOut is called when a packet don't have received after MaxRtt. | |||
OnTimeOut func(*Packet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this naming confusing: There's a Timeout
field, but OnTimeOut
is for MaxRtt
.
The spelling is also inconsistent with the comment on MaxRtt
– I believe Timeout
is more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There's a total runtime timeout, and this introduces a packet timeout.
Perhaps OnPacketTimeout
and PacketTimeout
instead of MaxRtt
.
@@ -129,6 +127,9 @@ type Pinger struct { | |||
// Timeout specifies a timeout before ping exits, regardless of how many | |||
// packets have been received. | |||
Timeout time.Duration | |||
// MaxRtt If no response is received after this time, OnTimeout is called | |||
// important! This option is not guaranteed. and if we receive the packet that was timeout, the function OnDuplicateRecv will be called | |||
MaxRtt time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure either way: Should we capitalize RTT as in common style guidelines, or keep it as Rtt
for consistency with RecordRtts
?
@@ -520,20 +520,48 @@ func (p *Pinger) runLoop( | |||
|
|||
timeout := time.NewTicker(p.Timeout) | |||
interval := time.NewTicker(p.Interval) | |||
packetTimeout := time.NewTimer(time.Duration(math.MaxInt64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I associate timeouts with contexts, would those be suitable here? Would it make a difference to how we would handle them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to be very careful about how we use stdlib functions here. We want to avoid any cases where managing the timeout would spawn a goroutine. If you had an interval of 100ms, and a timeout of 15s, that would be creating and destroying a lot of goroutines in the background.
Maybe a context call would be efficient enough? It would be useful to benchmark.
p.PacketsRecvDuplicates++ | ||
if p.OnDuplicateRecv != nil { | ||
p.OnDuplicateRecv(inPkt) | ||
} | ||
return nil | ||
} | ||
// remove it from the list of sequences we're waiting for so we don't get duplicates. | ||
delete(p.awaitingSequences[*pktUUID], pkt.Seq) | ||
p.awaitingSequences.removeElem(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What protects awaitingSequences
from concurrent access?
} | ||
}) | ||
|
||
t.Run("putElem", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These subtests mutate a shared map, and rely on it being in a specific state. Please make each test self-contained so it can be run in isolation. Also, I would recommend making the tests run in parallel (t.Parallell()
both for the main and the subtests), and try running the tests with -race -count 100
to make sure they behave well. This makes the tests much less brittle and tends to reveal unexpected race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please either document that seqMap
must not be used concurrently, or add a test that actually uses it concurrently.
I believe that developers should have a way to know when a data packet is lost or does not return within a specified time.
you can set MaxRtt and OnTimeout func
Ontime func while be call when a request was not answered within a specified time
you only need add:
It has good compatibility, as if MaxRtt is not set, it will have the same behavior as before.
However, there are still some issues remaining.
But I believe that this is acceptable.