You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In HyPerCol::advanceTime(), in the same spot we call sigpending to check for SIGUSR1, we could do the curl statement (or maybe it would be wget), and if there is a termination warning we could set checkpointSignal to 2 (sending SIGUSR1 sets checkpointSignal to 1). I think we'd want to make sure we don't fetch the URL more often than the Amazon-recommended 5 seconds, but it should be pretty straightforward to add.
Alternatively, we could have PV_Init launch a simple script that runs the curl statement every 5 seconds, and sends SIGUSR1 to the PetaVision process when necessary. One thing about that is we might want to be able to see in the log file whether the job terminated from Amazon killing the instance or from the user running killall -SIGUSR1.
the first approach seems easier to implement. maybe we could keep track of the last wget/curl AWS termination check to make sure we don't check too often. 2 minutes is a long time. Just ask Peyton Manning! Since we would at most only be checking as often as we check sigusr1, there's no reason to check the termination condition more often that that.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices
the above link states that AWS provides a 2 minute warning before termination. Can we use this warning the same way we use
$: killall -SIGUSR1 <name_of_executable>
to write a final checkpoint before termination? In fact, we almost don't even have to formally checkpoint with the above mechanism.
The text was updated successfully, but these errors were encountered: