-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable handling of nested fields when injecting request_option in request body_json #201
base: main
Are you sure you want to change the base?
Conversation
to handle field_path in JSON body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some comments on this PR even if it is draft. Poke me when you want me to check it again
value: The value to inject | ||
config: The config object to use for interpolation | ||
""" | ||
if self._is_field_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify the logic by having in post_init:
if self.field_name is not None:
self.field_path = [InterpolatedString.create(self.field_name, parameters=parameters)]
This way, we would only have one logic to maintain and it would be the field_path
one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit skeptical that this could potentially lead to conflicts in the Builder if we started mixing the attributes, but I think I see what you're saying. If we just modify the internal logic here in the CDK to allow field_name
, but only use it to update and use field_path
instead, we simplify the backend handling while still allowing the Builder to define them as entirely separate fields, correct?
My only hangup is wondering if there isn't still some scenario where by allowing field_path
to essentially override field_name
, we could accidentally send those updated values back to the frontend? And it would be picked up as a manifest change? I don't yet know enough about the full lifecycle of the back and forth between the CDK and Builder components to know if that's a valid concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit skeptical that this could potentially lead to conflicts in the Builder if we started mixing the attributes
If this is the case, we haven't made a good job at designing our interfaces and we will lack the benefits of easy maintenance. In an ideal world, we should be able to replace the whole CDK implementation by something else and the Connector Builder should continue to work.
we simplify the backend handling
Yes! Instead of knowing that there are two cases, we all rely on the cases where we handle a list and that's it!
we could accidentally send those updated values back to the frontend?
My understand is that the only case we send back manifest information to the frontend is when we do resolve_manifest
(or something like that) and it feels like this flow only work with the manifest directly and not with Python implementations so I would expect that we would be fine.
airbyte_cdk/utils/mapping_helpers.py
Outdated
that string. Raise errors in the following cases: | ||
* If there are duplicate keys across mappings | ||
Combine multiple mappings into a single mapping, supporting nested structures. | ||
If any of the mappings are a string, return that string. Raise errors in the following cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would precise that it is "If there only one mapping that is a string, it will return this mapping regardless of the other mappings".
That being said, I don't understand this behavior though. Maybe I'm wrong but before, I think it would fail if there was a string and a mapping, right?
airbyte_cdk/utils/mapping_helpers.py
Outdated
|
||
# If any mapping is a string, return it | ||
# Convert all mappings to flat (path, value) pairs for conflict detection | ||
all_paths: List[List[tuple[List[str], Any]]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the following logic is a bit complex. Would there be a way to simplify this? Right now, I see that we:
- Extract all paths
- Compare all paths to find conflicts
- Create a new mapping that we can return
Would it be simpler to do just the last one and validate for conflicts while we do that? Something like:
def combine_mappings(
mappings: List[Optional[Union[Mapping[str, Any], str]]],
) -> Union[Mapping[str, Any], str]:
"""
Combine multiple mappings into a single mapping, supporting nested structures.
If any of the mappings are a string, return that string. Raise errors in the following cases:
* If there are conflicting paths across mappings (including nested conflicts)
* If there are multiple string mappings
* If there are multiple mappings containing keys and one of them is a string
"""
if not mappings:
return {}
# Count how many string options we have, ignoring None values
string_options = sum(isinstance(mapping, str) for mapping in mappings if mapping is not None)
if string_options > 1:
raise ValueError("Cannot combine multiple string options")
# Filter out None values and empty mappings
non_empty_mappings = [
m for m in mappings if m is not None and not (isinstance(m, Mapping) and not m)
]
# If there is only one string option, return it
if string_options == 1:
if len(non_empty_mappings) > 1:
raise ValueError("Cannot combine multiple options if one is a string")
return next(m for m in non_empty_mappings if isinstance(m, str))
# Convert all mappings to flat (path, value) pairs for conflict detection
for other in mappings[1:]:
if other:
merge(mappings[0], other)
return mappings[0]
def merge(a: dict, b: dict, path=[]):
"""
Blindly and shamelessly taken from https://stackoverflow.com/a/7205107
"""
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
merge(a[key], b[key], path + [str(key)])
elif a[key] != b[key]:
raise ValueError('Duplicate keys')
else:
a[key] = b[key]
return a
If we are afraid of modifying the mappings in memory, we can create a deepcopy of it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the logic to be more in line with your suggestion. I removed the flattening and rebuilding of data in favour of a single helper method that recursively merges mappings directly, so it should hopefully be cleaner. Updated the tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: A number of unit tests went red due to the updated mapping logic being more permissive in allowing duplicate keys to exist. I made some more changes to handle body_json
injections differently than other request types. So:
-
For header, parameter, and body_data requests:
- We still maintain strict validation where any duplicate key is considered a conflict
- This preserves existing behavior and prevents unintended header/parameter collisions
-
For body_json requests only:
- We support nested structures
- We allow duplicate keys if they have matching values
- We only raise conflicts if same path has different values
This distinction is controlled by an allow_same_value_merge
flag in combine_mappings
which is only set to True
for body_json
requests. But I'm wondering if this is leading to the logic being overly contrived again. Another option could be to create two separate methods for handling either scenario?
…other injection types
📝 WalkthroughWalkthroughThe pull request introduces comprehensive modifications to the Airbyte CDK's declarative source components, focusing on enhancing request option handling, authentication, and schema flexibility. The changes primarily involve updating type annotations, refactoring request option injection methods, and introducing new capabilities like nested field path support for request configurations. Changes
Sequence DiagramsequenceDiagram
participant RequestOption
participant Authenticator
participant HttpRequester
RequestOption->>Authenticator: Create with field_path/field_name
Authenticator->>HttpRequester: Inject request options
HttpRequester-->>Authenticator: Populate request parameters
RequestOption->>RequestOption: Validate injection type
Possibly related PRs
Suggested Labels
Suggested Reviewers
Hey there! 👋 I noticed some interesting changes in how request options are being handled. The introduction of Would you be interested in expanding the README or creating a migration guide to help users understand the new nested field injection feature? It could really help smooth the transition for existing declarative source implementations. 🚀 ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
airbyte_cdk/sources/declarative/requesters/request_option.py (1)
43-49
: Could we simplify the mutual exclusivity checks?Hi! I noticed that we're checking for mutual exclusivity between
field_name
andfield_path
with separate conditions. Perhaps we could simplify this by ensuring exactly one is provided using a single condition:if (self.field_name is None) == (self.field_path is None): raise ValueError("RequestOption requires exactly one of field_name or field_path")This way, we handle both cases in one check. Wdyt?
unit_tests/utils/test_mapping_helpers.py (2)
19-44
: Parameterizing string handling tests enhances clarity!Consolidating the string handling tests with parameterization makes the intent of each test case clearer and reduces code duplication. This approach is very effective. Wdyt?
97-106
: Consider providing more detailed error messages in tests?In the
test_body_json_requests
, the expected error message is "Duplicate keys found". Do you think adding more context to the error message could help in identifying issues faster during debugging? For example, specifying which keys are duplicated. Wdyt?airbyte_cdk/utils/mapping_helpers.py (2)
10-45
: LGTM! Consider adding doctest examples?The implementation is robust and handles nested structures well. Would you consider adding some doctest examples to demonstrate the different behaviors between
allow_same_value_merge=True
andFalse
? This could help future maintainers understand the nuances quickly, wdyt?Example:
def _merge_mappings( target: Dict[str, Any], source: Mapping[str, Any], path: Optional[List[str]] = None, allow_same_value_merge: bool = False, ) -> None: """ Recursively merge two dictionaries, raising an error if there are any conflicts. >>> target = {"a": 1} >>> source = {"a": 1} >>> _merge_mappings(target, source, allow_same_value_merge=True) # No error >>> target {'a': 1} >>> target = {"a": 1} >>> source = {"a": 1} >>> _merge_mappings(target, source, allow_same_value_merge=False) # Raises ValueError Traceback (most recent call last): ... ValueError: Duplicate keys found: a """
47-104
: LGTM! Consider enhancing type hints?The implementation is clean and well-documented. Would you consider making the type hints more specific for the return type? Instead of
Union[Mapping[str, Any], str]
, we could use a TypeVar to preserve the exact input type, wdyt?from typing import TypeVar T = TypeVar('T', bound=Union[Mapping[str, Any], str]) def combine_mappings( mappings: List[Optional[T]], allow_same_value_merge: bool = False, ) -> T:unit_tests/sources/declarative/requesters/request_options/test_request_options.py (1)
51-93
: LGTM! Consider adding edge cases?Great test coverage! Would you consider adding a few edge cases to
test_inject_into_request_cases
, wdyt?
- Empty path segments:
field_path=["data", "", "field"]
- Special characters in paths:
field_path=["data", "field.with.dots"]
- Integer keys:
field_path=["data", "0", "field"]
unit_tests/sources/declarative/auth/test_token_auth.py (1)
253-285
: LGTM! Consider adding error cases?Great test cases for the happy path! Would you consider adding some error scenarios to ensure proper validation, wdyt?
- Invalid field path (e.g.,
None
in path)- Circular interpolation in path
- Non-string interpolation results
airbyte_cdk/sources/declarative/auth/token.py (1)
58-60
: Consider adding a docstring for clarity, wdyt?The
_get_request_options
method has been updated to useMutableMapping
and the newinject_into_request
method. While the changes look good, adding a docstring would help explain the purpose of these changes, especially regarding nested field support.def _get_request_options(self, option_type: RequestOptionType) -> Mapping[str, Any]: + """ + Get request options for the specified option type. + + Args: + option_type: The type of request option (header, parameter, body_data, or body_json) + + Returns: + A mapping containing the request options, supporting nested fields for body_json + """ options: MutableMapping[str, Any] = {}airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2819-2829
: Consider adding mutual exclusivity validation?The new
field_path
property looks great! Sincefield_name
andfield_path
serve similar purposes but in different contexts, would it make sense to add validation to ensure they're not used simultaneously? This could help prevent confusion and potential conflicts, wdyt?Example validation in the schema:
oneOf: - required: [field_name] not: required: [field_path] - required: [field_path] not: required: [field_name]airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
131-133
: Consider adding a comment explaining the special handling of body_json.The introduction of
is_body_json
flag affects mapping behavior. Would you consider adding a brief comment explaining why body_json requests need different handling for merging mappings with same values? This would help future maintainers understand the intent better.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
airbyte_cdk/sources/declarative/auth/token.py
(2 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/partition_routers/list_partition_router.py
(2 hunks)airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py
(2 hunks)airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py
(2 hunks)airbyte_cdk/sources/declarative/requesters/request_option.py
(2 hunks)airbyte_cdk/sources/declarative/requesters/request_options/datetime_based_request_options_provider.py
(1 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(2 hunks)airbyte_cdk/utils/mapping_helpers.py
(1 hunks)unit_tests/sources/declarative/auth/test_token_auth.py
(1 hunks)unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
(5 hunks)unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py
(1 hunks)unit_tests/sources/declarative/requesters/request_options/test_request_options.py
(1 hunks)unit_tests/sources/declarative/requesters/test_http_requester.py
(1 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(6 hunks)unit_tests/utils/test_mapping_helpers.py
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py
- unit_tests/sources/declarative/retrievers/test_simple_retriever.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (20)
airbyte_cdk/sources/declarative/requesters/request_option.py (2)
58-65
: Should we unify field handling by always usingfield_path
?I see that we're handling
field_name
andfield_path
separately. What do you think about convertingfield_name
into a single-elementfield_path
list during initialization? This might simplify the code by allowing us to work exclusively withfield_path
. For example:if self.field_name is not None: self.field_path = [InterpolatedString.create(self.field_name, parameters=parameters)]This could reduce conditional logic later on. Wdyt?
85-116
: Handlinginject_into_request
logic elegantly!The use of
inject_into_request
method to handle bothfield_name
andfield_path
cases is clean and enhances readability. It effectively encapsulates the injection logic. Nice work!airbyte_cdk/sources/declarative/requesters/request_options/datetime_based_request_options_provider.py (2)
85-87
: Should we handle missing time values before injection?Hi there! In the
_get_request_options
method, ifstart_time_value
orend_time_value
is missing from thestream_slice
, we might end up injectingNone
into the request options. Do you think it would be good to check if these values are notNone
before injecting them? This could prevent potential issues with requests. Wdyt?Also applies to: 89-91
83-91
: Great refactoring to useinject_into_request
method!Switching to
inject_into_request
for injecting options enhances modularity and keeps the codebase consistent. This improves maintainability and readability. Well done!unit_tests/utils/test_mapping_helpers.py (1)
6-17
: Nice use of parameterization for basic functionality tests!Refactoring the basic mapping tests into a parameterized function improves the test suite's structure and makes it easier to add new cases in the future. Great job!
airbyte_cdk/sources/declarative/partition_routers/list_partition_router.py (1)
103-105
: LGTM! Clean integration with the new request option handling.The changes properly utilize the new
inject_into_request
method while maintaining type safety withMutableMapping
.airbyte_cdk/sources/declarative/auth/token.py (1)
8-8
: LGTM! Enhanced type imports.The addition of
MutableMapping
to imports supports better type safety for request options handling.airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py (1)
190-190
: LGTM! Improved request option handling.The changes look good:
- Using
MutableMapping
for better type safety- Using
inject_into_request
for both page token and page size options- Clear separation of concerns between token and page size injection
Also applies to: 199-199, 200-200, 206-208
airbyte_cdk/sources/declarative/requesters/http_requester.py (1)
202-204
: Consider adding a test case for body_json merging, wdyt?The addition of
is_body_json
flag andallow_same_value_merge
parameter looks good for supporting nested fields in GraphQL queries. However, it would be beneficial to add test cases specifically for this scenario.Run this script to check existing test coverage:
Also applies to: 214-215
airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py (1)
121-121
: LGTM! Enhanced type safety and request option handling.The changes improve the code by:
- Using
MutableMapping
for better type safety- Using
inject_into_request
for cleaner parameter injectionAlso applies to: 131-131
unit_tests/sources/declarative/requesters/test_http_requester.py (1)
280-281
: Test cases updated to match new body_json behavior.The test expectations now correctly reflect that same values can be merged in body_json context. This aligns with the changes in the implementation.
unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py (2)
785-787
: Great addition of nested field injection test case!The new test case thoroughly validates the nested field injection capability for GraphQL queries. The test parameters are well structured and cover the key scenarios.
Also applies to: 827-845
872-874
: Clean update to test helper code.The test helper code has been updated to support both field_name and field_path parameters, maintaining backward compatibility while adding support for the new functionality.
Also applies to: 879-884
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
722-722
: LGTM! Nice improvement in clarity.Setting an explicit default value of
None
for the optionalcondition
field makes the intent clearer.
1173-1178
: Deprecation notice looks good.The field description clearly indicates the plan to deprecate
field_name
in favor offield_path
. This aligns with the previous discussion in PR comments.
1179-1183
: LGTM! Well-documented new field for nested paths.The new
field_path
field is well-documented with clear examples, and it aligns perfectly with the PR objective of supporting nested fields for GraphQL queries.airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
2801-2801
: LGTM! Clear description update.The updated description accurately reflects the addition of path-based field specification while maintaining backward compatibility.
2811-2812
: LGTM! Well-documented deprecation notice.The deprecation notice clearly explains that
field_name
will eventually be replaced byfield_path
while maintaining backward compatibility for the Builder. This helps users prepare for future changes.airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (1)
368-376
: LGTM! Nice refactor of request option handling.The changes improve maintainability by separating value extraction from injection logic. The use of
inject_into_request
method provides a more robust way to handle request options.airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
149-149
: Verify combine_mappings behavior with allow_same_value_merge.Let's ensure the combine_mappings utility correctly handles the new parameter.
✅ Verification successful
combine_mappings implementation verified ✅
The utility correctly handles the
allow_same_value_merge
parameter with comprehensive test coverage for both body_json and non-body-json scenarios. The implementation in simple_retriever.py is using it correctly!🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check combine_mappings implementation for allow_same_value_merge parameter # Search for combine_mappings implementation and its tests rg -A 10 "def combine_mappings"Length of output: 907
Script:
#!/bin/bash # Get the full implementation and tests rg -g "mapping_helpers*" -A 30 "def combine_mappings" # Look for test files fd "test.*mapping.*" --exec cat {}Length of output: 9606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Can we just document that this is a "soft" breaking change meaning that Python code relying on RequestOption.field_name
might stop working but the alternative should be pretty simple i.e. using the method inject_into_request
?
value: The value to inject | ||
config: The config object to use for interpolation | ||
""" | ||
if self._is_field_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit skeptical that this could potentially lead to conflicts in the Builder if we started mixing the attributes
If this is the case, we haven't made a good job at designing our interfaces and we will lack the benefits of easy maintenance. In an ideal world, we should be able to replace the whole CDK implementation by something else and the Connector Builder should continue to work.
we simplify the backend handling
Yes! Instead of knowing that there are two cases, we all rely on the cases where we handle a list and that's it!
we could accidentally send those updated values back to the frontend?
My understand is that the only case we send back manifest information to the frontend is when we do resolve_manifest
(or something like that) and it feels like this flow only work with the manifest directly and not with Python implementations so I would expect that we would be fine.
…m/airbytehq/airbyte-python-cdk into christo/request-option-field-path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2077-2079
: Consider passing the parameters from the model for better consistency
Right now, you are passing an empty dictionary asparameters
. Perhaps we could passmodel.parameters
if needed for a consistent approach with the rest of the code. Wdyt?- return RequestOption( - field_name=field_name, field_path=field_path, inject_into=inject_into, parameters={} + return RequestOption( + field_name=field_name, + field_path=field_path, + inject_into=inject_into, + parameters=model.parameters or {}, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2075-2076
: Add clarity on the type ignore directive?
It looks like you used# type: ignore
to handle a potential mismatch in the declared type or themodel.field_path
. Maybe it's beneficial to refine the type annotation or handle the mismatch if there's a known reason behind it, wdyt?
What
To support GraphQL queries natively in the CDK, we need to add handling of nested structures when injecting a
RequestOption
into a request body (specifically withbody_json
injection).How
RequestOption
class now includes an optionalfield_path
param, and each instance must declare afield_path
ORfield_name
.inject_into_request
method determines how to handle the injection based on the existense offield_path
and the specificRequestOptionType
. As stated above, onlybody_json
injections will support nested fields.combine_mappings
utility function now handles nested structures when checking for mapping conflicts.Suggested review order
request_option.py
: Has the main updated logic for validation and the new methodinject_into_request
mapping_helpers.py
: added a helper method and extended the main function for catching conflicts in nested structures.declarative_component_schema.yaml
anddeclarative_component_schema.py
: Updated schema + modeltoken.py
,datetime_based_cursor.py
,datetime_based_request_options_provider.py
,default_paginator.py
,list_partition_router.py
andsubstream_partition_router.py
all have minor updates to leverage the new method.Summary by CodeRabbit
Release Notes
New Features
field_path
for specifying nested structures in JSON requests.Improvements
Bug Fixes
Testing