Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rel2.0): default removal option to REMOVE_ALL_EDGES_FROM_SOURCE #478

Merged
merged 7 commits into from
Nov 20, 2024

Conversation

jsdonn
Copy link
Contributor

@jsdonn jsdonn commented Nov 16, 2024

Summary

Big PR with several major changes to relationship ingestion.

Major

  1. Default relationship removal options to REMOVE_ALL_EDGES_FROM_SOURCE during ingestion.
    One caveat is the case where relationships are removed when an aspect is soft-deleted. For now, we will only allow soft-deleting an aspect which has relationships when the DAO is in RelationshipSource.ASPECT_METADATA mode. By default, the DAO is in RelationshipSource.RELATIONSHIP_BUILDERS mode which does NOT allow soft deleting aspects that have relationships.

  2. RelationshipV2 can be built and ingested using relationship builders. This is to bridge the gap for existing aspects with data currently ingested into the metadata graph but do not yet have any relationships. Since new relationship models can only be defined following the Relationship 2.0 spec, we need a way to support building relationshipV2s with relationship builders.

  3. Removed the neo4j module since no one is using that anymore.

Minor

  1. Remove destination field parameter from checkSameUrn. Now throws an exception if the removal option is not REMOVE_ALL_EDGES_FROM_SOURCE.

  2. Removed validateRelationshipSchema methods which don't specify v1 or v2. Now, relationship schema validation must include a boolean parameter indicating that the relationship is v1 or v2.

Testing Done

Added several unit tests covering multiple ingestion cases.

  1. relationshipv2 -> use relationship builders when relationship builders exist and return non-empty relationship updates
  2. relationshipv2 -> use aspect metadata when relationship builders don't return any relationship updates
  3. relationshipv2 -> default to relationship builders when both relationship builders and relationship fields exist

Update several other unit tests to expect IllegalArgumentExceptions when passing in removal options that are not REMOVE_ALL_EDGES_FROM_SOURCE.

Updated several unit tests with new checkSameUrn and validateRelationshipSchema method signatures.

Added a new aspect for testing (AspectFooBaz) so updated a few resource class unit tests.

Checklist

Copy link
Contributor

@yangyangv2 yangyangv2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments, thanks for the quick solution and code cleanup!


throw new IllegalArgumentException(String.format("Removal option %s is not valid.", removalOption));
throw new IllegalArgumentException("Relationships can only be removed using either REMOVE_ALL_EDGES_FROM_SOURCE "
+ "when inserting new relationships or REMOVE_ALL_EDGES_FROM_SOURCE_TO_DESTINATION when soft deleting an aspect "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the consideration of differentiating soft-deleting an aspect from normal update? shouldn't soft-delete be considered as set empty to the relationships?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm going to clean this up in the next PR where I will add "aspect" column to the relationship tables.

localRelationshipUpdates.forEach(update -> update.setRemovalOption(BaseGraphWriterDAO.RemovalOption.REMOVE_ALL_EDGES_FROM_SOURCE));
}
// If no relationship updates were found using relationship builders, try to get them via the aspect.
if (localRelationshipUpdates.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be a case that an Aspect has one or multiple graph builders and also contains ASPECT_METADATA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aspect cannot have multiple graph builders due to the way that the builders are registered.

An aspect can have a graph builder and also contain ASPECT_METADATA. The behavior in this PR will prioritize finding the relationships from the graph builders first. Otherwise, try the aspect metadata. Otherwise, no relationship updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aspect cannot have multiple graph builders due to the way that the builders are registered.

Can an aspect have multiple ASPECT_METADATA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prioritize finding the relationships from the graph builders first.

what if a new ASPECT_METADATA added into an aspect with graph builder, will the ASPECT_METADATA be ignored?

Copy link
Contributor Author

@jsdonn jsdonn Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an aspect have multiple ASPECT_METADATA?

Aspects can have multiple relationship-typed fields (either array or singleton) yes.

what if a new ASPECT_METADATA added into an aspect with graph builder, will the ASPECT_METADATA be ignored?

If the graph builder returns non-empty relationships, then yes the relationships from ASPECT_METADATA will be ignored. If the graph builder returns empty list, then it will try to look in ASPECT_METADATA for relationships.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a limitation. You can include a comment to indicate this if we need more thought on its future behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it's a limitation? the alternative is that we can prioritize getting the relationship from aspect metadata first and then falling back on relationship builders if needed. But that is an easy change we can make in the future

Copy link
Contributor

@yangyangv2 yangyangv2 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible that users add a new relationship model on this aspect and start filling relationship data with new relationships but want to keep existing graph builder working from existing attributes

Copy link
Contributor

@yangyangv2 yangyangv2 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consideration to not support both is we don't want to have graph builder and the default relationship parser produce duplicate relationships. It is a solvable problem IMO, but want to think more if there're other side impact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I added a TODO comment noting this gap.

Copy link
Contributor

@yangyangv2 yangyangv2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing the comments.

The last comment is more for the future work, not necessarily block this PR. Please test this feature change carefully.

@jsdonn jsdonn merged commit 8ccc25e into linkedin:master Nov 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants