Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor #254

Merged
merged 4 commits into from
Jan 23, 2025

Conversation

tolik0
Copy link
Contributor

@tolik0 tolik0 commented Jan 23, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a new manifest configuration for handling streams related to posts, comments, and votes, enhancing data synchronization capabilities.
    • Added a new method for dynamic cursor creation for specified partitions, improving cursor management.
  • Refactor

    • Updated internal logic for handling stream slicers in the simple retriever creation process.
    • Simplified conditional checks for stream slicer type determination.
    • Streamlined code by reducing redundancy in cursor creation methods.
  • Tests

    • Added a new test case to validate the functionality of the new manifest for incremental syncing.

@github-actions github-actions bot added the bug Something isn't working label Jan 23, 2025
Copy link
Contributor

coderabbitai bot commented Jan 23, 2025

📝 Walkthrough

Walkthrough

The pull request modifies the create_simple_retriever method in the ModelToComponentFactory class within the Airbyte CDK. The primary change involves simplifying the conditional logic for handling stream_slicer types. The method now focuses solely on checking if the stream_slicer is not an instance of DatetimeBasedCursor, removing a previous check for PerPartitionWithGlobalCursor. This adjustment potentially changes how request_options_provider is assigned and processed. Additionally, the PerPartitionCursor class has been updated to include a new method for managing partition cursors, enhancing the overall cursor handling logic.

Changes

File Change Summary
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py - Simplified conditional logic in create_simple_retriever method
- Removed check for PerPartitionWithGlobalCursor
- Updated method's internal logic for request_options_provider assignment
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py - Added _create_cursor_for_partition method
- Simplified should_be_synced method by centralizing cursor creation logic
- Updated several method signatures to incorporate new cursor creation logic
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py - Added new manifest SUBSTREAM_REQUEST_OPTIONS_MANIFEST
- Introduced test function test_incremental_substream_request_options_provider to validate new manifest functionality

Possibly related PRs

Suggested reviewers

  • maxi297
  • brianjlai

Hey there! 👋 I noticed the changes look like they're streamlining the stream slicer type checking. Curious about the motivation behind removing the PerPartitionWithGlobalCursor check? Would you mind sharing the context that led to this simplification? Wdyt? 🤔


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7795937 and f0f8c87.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2377-2377: Consider adding a comment explaining the condition change.

The condition was simplified to only check for DatetimeBasedCursor. Would it be helpful to add a comment explaining why PerPartitionWithGlobalCursor check was removed and how it affects the request_options_provider assignment? This would help future maintainers understand the reasoning behind this change, wdyt?

+        # We only need to check for DatetimeBasedCursor as it has its own request_options_provider.
+        # All other stream slicers, including PerPartitionWithGlobalCursor, can act as request_options_provider.
         if (
             not isinstance(stream_slicer, DatetimeBasedCursor)
             or type(stream_slicer) is not DatetimeBasedCursor
         ):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4459243 and 4444893.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

2377-2379: 🛠️ Refactor suggestion

Consider simplifying the type check condition.

The current condition uses both isinstance and type checks with an or operator. This might be redundant and potentially confusing. Would it be clearer to use just isinstance? The current or condition means the block executes if either check passes, which might not be the intended behavior, wdyt?

-        if (
-            not isinstance(stream_slicer, DatetimeBasedCursor)
-            or type(stream_slicer) is not DatetimeBasedCursor
-        ):
+        if not isinstance(stream_slicer, DatetimeBasedCursor):

Likely invalid or redundant comment.


Line range hint 2380-2385: Consider documenting the breaking change.

The comment indicates that custom components implementing DatetimeBasedCursor will need to implement their own RequestOptionsProvider. This is a breaking change that might affect users. Would it be helpful to:

  1. Add a migration guide or documentation?
  2. Expand the comment to provide an example of how to implement a custom RequestOptionsProvider?
  3. Add a deprecation warning for the old behavior?

Let's check for any custom components that might be affected:

@tolik0 tolik0 requested a review from maxi297 January 23, 2025 15:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (2)

356-362: Consider adding error handling for partition key validation?

The new create_cursor_for_partition method nicely centralizes the cursor creation logic! However, should we add validation to ensure the partition key is not empty or None? This could prevent potential issues down the line, wdyt? 🤔

 def create_cursor_for_partition(self, partition_key: str) -> None:
+    if not partition_key:
+        raise ValueError("Partition key cannot be empty")
     partition_state = (
         self._state_to_migrate_from if self._state_to_migrate_from else self._NO_CURSOR_STATE
     )
     cursor = self._create_cursor(partition_state)
     self._cursor_per_partition[partition_key] = cursor

225-226: Extract common partition check into a helper method?

I notice we have the same partition check in all request-related methods. What do you think about extracting this into a helper method to make the code more DRY? Something like this perhaps? 🎯

+ def _ensure_partition_cursor_exists(self, partition: Mapping[str, Any]) -> None:
+     partition_key = self._to_partition_key(partition)
+     if partition_key not in self._cursor_per_partition:
+         self.create_cursor_for_partition(partition_key)

  def get_request_params(...):
      if stream_slice:
-         if self._to_partition_key(stream_slice.partition) not in self._cursor_per_partition:
-             self.create_cursor_for_partition(self._to_partition_key(stream_slice.partition))
+         self._ensure_partition_cursor_exists(stream_slice.partition)

Also applies to: 249-250, 273-274, 297-298

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4444893 and 50a968e.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)

352-362: Great refactoring! 🎉

The centralization of cursor creation logic and consistent handling across all methods makes the code more maintainable and robust. The changes look good to me!

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-emptively approve

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (2)

2329-2602: Consider refactoring manifest definitions to reduce duplication

The SUBSTREAM_REQUEST_OPTIONS_MANIFEST shares many similarities with the existing SUBSTREAM_MANIFEST. Would it be helpful to refactor common components into shared definitions or functions to avoid duplication and improve maintainability? Wdyt?


2605-2875: Enhance test coverage by verifying request parameters

In the test_incremental_substream_request_options_provider function, we're verifying records and state transitions. Should we also include assertions to confirm that the post_id and comment_id parameters are correctly passed in the requests? This could help ensure that the RequestOptionsProvider is functioning as expected. Wdyt?

airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (2)

225-226: Refactor repeated cursor initialization into a helper method

In the methods get_request_params, get_request_headers, get_request_body_data, and get_request_body_json, there's repeated logic for checking and creating cursors for partitions. Would it be beneficial to extract this logic into a helper method to reduce duplication and enhance code clarity? Wdyt?

Also applies to: 249-250, 273-274, 297-298


356-380: Consider adding unit tests for _create_cursor_for_partition

Since _create_cursor_for_partition plays a crucial role in initializing cursors, would adding unit tests for this method help ensure its correctness and prevent future regressions? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50a968e and 7795937.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (5 hunks)
  • unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Analyze (python)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)

356-380: Add a comment explaining the temporary nature of _create_cursor_for_partition

The _create_cursor_for_partition method is introduced to dynamically create cursors for partitions. Could we add a comment to explain why this method is necessary and mention that it's a temporary workaround until the cursor is decoupled from the concurrent cursor implementation? This would help future maintainers understand its purpose. Wdyt?

@tolik0 tolik0 merged commit ec7e961 into main Jan 23, 2025
15 of 19 checks passed
@tolik0 tolik0 deleted the tolik0/fix-request-options-provider branch January 23, 2025 20:28
rpopov pushed a commit to rpopov/airbyte-python-cdk that referenced this pull request Jan 23, 2025
* remotes/airbyte/main:
  fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254)
  feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236)
  feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111)
  fix: don't mypy unit_tests (airbytehq#241)
  fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225)
  feat(concurrent cursor): attempt at clamping datetime (airbytehq#234)
  ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244)
  Fix(sdm): module ref issue in python components import (airbytehq#243)
  feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174)
  chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133)
  docs: comments on what the `Dockerfile` is for (airbytehq#240)
  chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants