Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444

Open
wants to merge 9 commits into
base: transition-to-runkit
Choose a base branch
from

Conversation

jhonasiv
Copy link
Contributor

@jhonasiv jhonasiv commented Dec 6, 2024

We introduced a ResolutionError to mark errors during the new plan
network resolution shouldnt be raised, which causes the whole transation
to fail. Instead, we capture them and fail the deployment of the
specific tasks that caused them.

@jhonasiv jhonasiv requested review from doudou and wvmcastro December 6, 2024 13:34
@jhonasiv jhonasiv self-assigned this Dec 6, 2024
@jhonasiv jhonasiv force-pushed the capture-errors branch 3 times, most recently from d0b83eb to 63070d9 Compare December 6, 2024 17:17
@jhonasiv jhonasiv force-pushed the capture-errors branch 2 times, most recently from 72722c1 to d070e92 Compare December 23, 2024 21:12
@jhonasiv jhonasiv changed the title [WIP] feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution Jan 29, 2025
InternalResolutionFailure is a helper struct that collects enough
information about a resolution failure so it can be turned into a
ResolutionError in another scope. The ResolutionError describes the
instance task and the toplevel task, so the system can clean them up and
propagate the error at the right scope. Finally, the
PartialNetworkResolution should be used as an exception, aggregating
multile resolution errors/failures to be raised at once.
Copy link

@wvmcastro wvmcastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is not clear to me, when trying to deploy the below composition there is an error with second deployment and on_error: commit. The deployer will still deploy first and second child?

class Parent < Compositions
  add Model1, as: "first"
  add Model2, as "second"
  add Model3, as "third"
  ...
end

lib/syskit/cli/doc/gen.rb Outdated Show resolved Hide resolved
Comment on lines 502 to 503
def validate_generated_network(toplevel_tasks: @toplevel_tasks,
merge_solver: @merge_solver)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some plugin will pass its own toplevel_tasks or merge_solver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, any plugin should implement this version of the method themselves (that's where the super comes in). I will resolve these arguments, they arent needed.

@jhonasiv
Copy link
Contributor Author

jhonasiv commented Feb 3, 2025

One thing that is not clear to me, when trying to deploy the below composition there is an error with second deployment and on_error: commit. The deployer will still deploy first and second child?

class Parent < Compositions
  add Model1, as: "first"
  add Model2, as "second"
  add Model3, as "third"
  ...
end

If a child of a parent fails the deployment validation, itself and all of its parents also fail, therefore the other children will also fail. PS: on_error: commit is only used by profile_assertions tests, so in normal circumstances the deployer will do what I just said, but when on_error: commit the whole transaction would be commited, which would lead to all present errors being propagated at the end of the network generation process.

These errors shouldnt block the whole transaction from being applied.
Instead, they are captured so we know which are the badly defined tasks,
so we can deal with them (i.e. propagate errors for each individual task) later on.
This is the first step of taking the captured resolution failures. They
are processed into ResolutionErrors, which contain information about the
instance requirement task and toplevel task it relates to, and details
about the exception that caused the resolution failure.

When cleanup_after_resolution_errors is on, the instance requirement
tasks that have a resolution error during network generation are not
deployed and the related toplevel task is removed from the plan. That is
the case to ensure the work_plan is clean and can continue the
transaction.

We also use the on_error behavior in the #resolve method to signal when
not to cleanup, as it should never cleanup when the behavior is set to
commit the transaction on error.
This allows for handling of resolution errors on the network generation
This emits the failed event on the planning task of a plan resolution
that failed. The error emitted is a ResolutionError
This is more in line with the new resolution errors, where these
"exceptions" are now replicated for each failing tasks individually,
instead of grouped into one exception
They now emit a PartialNetworkResolution when any resolution error is
detected. This is propagated especially to keep the behavior of the
profile assertions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants