feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

maxi297 · 2025-01-17T19:28:43Z

What

Updates the concurrent declarative source so that async retrievers w/ incremental components will be properly run within the concurrent framework and that it still checkpoints correctly

Note that this change might impact @tolik0 work here

How

Modifies the model_to_component_factory.py so that when creating the runtime components, the right cursors are instantiated. The concurrent_declarative_source.py also needs to be changed so that the correct cursor and partition routers are assigned.

Summary by CodeRabbit

New Features
- Enhanced state management for concurrent and asynchronous data streams.
- Improved handling of incremental and async stream processing.
Bug Fixes
- Corrected error messaging for stream slice validation.
Refactor
- Restructured state management in declarative sources.
- Modified component factory to support more flexible state handling.
- Updated cursor and retriever implementations for async operations.
Tests
- Added new test case for validating async incremental streams with state.
- Expanded test coverage for state management and cursor behavior.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

airbyte_cdk/sources/types.py

unit_tests/sources/declarative/test_concurrent_declarative_source.py

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

…d some more tests to concurrent_declarative_source

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

coderabbitai · 2025-01-23T01:23:12Z

📝 Walkthrough

Walkthrough

The pull request introduces modifications to the Airbyte CDK's declarative source components, focusing on enhancing concurrent stream processing and state management. The changes primarily affect the ConcurrentDeclarativeSource, ModelToComponentFactory, and related utility classes, with improvements in handling asynchronous retrievers, stream slicing, and state management. The modifications aim to provide more robust support for different retriever types and improve the flexibility of stream processing in declarative sources.

Changes

File	Change Summary
`airbyte_cdk/sources/declarative/concurrent_declarative_source.py`	- Added support for `AsyncJobPartitionRouter` - Updated state management with `ConnectorStateManager` - Modified stream grouping logic to utilize `self._connector_state_manager`
`airbyte_cdk/sources/declarative/manifest_declarative_source.py`	- Removed `CheckStream` import
`airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py`	- Added `connector_state_manager` parameter - Updated cursor creation logic to reference `_connector_state_manager`
`airbyte_cdk/sources/types.py`	- Modified `StreamSlice` hash method to use `SliceHasher`
`airbyte_cdk/utils/slice_hasher.py`	- Added default value for `stream_name` in `hash` method
`airbyte_cdk/sources/declarative/retrievers/async_retriever.py`	- Updated error message for `stream_slice` parameter to clarify its necessity
`unit_tests/sources/declarative/parsers/test_model_to_component_factory.py`	- Updated tests to include state management with `ConnectorStateManager`
`unit_tests/sources/declarative/test_concurrent_declarative_source.py`	- Added tests for async job handling with `ConcurrentCursor`

Possibly related issues

[Concurrent Low-Code] Allow low-code streams using AsyncRetriever to be run within the Concurrent CDK #168: Allows low-code streams using AsyncRetriever to be run within the Concurrent CDK
- The changes in this PR directly address the challenges mentioned in the issue, particularly around state management and async stream processing.

Possibly related PRs

chore(refactor): refactor partition generator to take any stream slicer #39: Modifications to ConcurrentDeclarativeSource stream handling
fix(Low-Code Concurrent CDK): Refactor the low-code AsyncRetriever to use an underlying StreamSlicer #170: Refactoring AsyncRetriever to include underlying StreamSlicer
feat(concurrent cursor): attempt at clamping datetime #234: Adding clamping functionality for datetime cursors

Suggested reviewers

Wdyt? Would you like to dive deeper into any specific aspect of these changes? 😊

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

70-73: Consider addressing the TODO comment on state initialization

There's a TODO comment suggesting that state may no longer need to be stored during initialization. Should we explore removing the state parameter now, or is there an edge case that still requires it? WDYT?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

502-502: Update typing and documentation for connector_state_manager

We've added connector_state_manager as an optional parameter. Should we update the type hints and documentation to reflect its usage within the class, ensuring clarity for future maintenance? WDYT?

airbyte_cdk/sources/declarative/retrievers/async_retriever.py (1)

78-78: Clarify the error message in _validate_and_get_stream_slice_partition

The error message could be more precise. Instead of stating "stream_slice is not optional," perhaps we can say "stream_slice must be provided and contain a 'partition' key." Would this make the error clearer? WDYT?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee49219 and 8a55d2c.

📒 Files selected for processing (8)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5 hunks)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (0 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
airbyte_cdk/sources/declarative/retrievers/async_retriever.py (1 hunks)
airbyte_cdk/sources/types.py (2 hunks)
airbyte_cdk/utils/slice_hasher.py (1 hunks)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (3 hunks)

💤 Files with no reviewable changes (1)

airbyte_cdk/sources/declarative/manifest_declarative_source.py

👮 Files not reviewed due to content moderation or server errors (4)

airbyte_cdk/sources/types.py
unit_tests/sources/declarative/test_concurrent_declarative_source.py
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
airbyte_cdk/utils/slice_hasher.py

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

81-81: Verify the impact of passing connector_state_manager

By passing self._connector_state_manager to the ModelToComponentFactory, we're centralizing state management. Can we ensure that all dependent components are updated accordingly to handle this change? WDYT?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1609-1624: Reconsider returning ConcurrentCursor from create_concurrent_cursor_from_datetime_based_cursor

Returning a ConcurrentCursor may introduce typing inconsistencies since it doesn't implement the same interface as the low-code StreamSlicer. Should we address this discrepancy to ensure type safety and maintainability? WDYT?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (3)
70-72: Consider improving type annotation for state management.

The TODO comment provides good context about potential future improvements. However, the type ignore comment suggests we could improve type safety. WDYT about creating a proper type for the state parameter to avoid the type ignore?
-        self._connector_state_manager = ConnectorStateManager(state=state)  # type: ignore  # state is always in the form of List[AirbyteStateMessage]. The ConnectorStateManager should use generics, but this can be done later
+        self._connector_state_manager = ConnectorStateManager[TState](state=state)  # The generic type ensures type safety at compile time
231-234: Consider adding defensive programming for state retrieval.

The state retrieval looks good, but what if the stream state is None? WDYT about adding a defensive check?
             stream_state = self._connector_state_manager.get_stream_state(
                 stream_name=declarative_stream.name,
                 namespace=declarative_stream.namespace,
             )
+            if stream_state is not None:
                 retriever.cursor.set_initial_state(stream_state=stream_state)
+            else:
+                self.logger.debug(f"No state found for stream {declarative_stream.name}")
251-278: Consider extracting partition generator creation logic.

The partition generator setup is quite complex and repeated. What do you think about extracting this into a separate method to improve maintainability and reduce code duplication? Something like:
def _create_partition_generator(
    self,
    stream_name: str,
    json_schema: dict,
    retriever: Any,
    stream_slicer: Any
) -> StreamSlicerPartitionGenerator:
    return StreamSlicerPartitionGenerator(
        partition_factory=DeclarativePartitionFactory(
            stream_name,
            json_schema,
            retriever,
            self.message_repository,
        ),
        stream_slicer=stream_slicer,
    )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a55d2c and 05ead34.

📒 Files selected for processing (1)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (4)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (4)

34-34: LGTM! Clean import additions.

The new imports for AsyncJobPartitionRouter and AsyncRetriever are well-organized and necessary for the enhanced concurrent processing functionality.

Also applies to: 36-36

73-81: LGTM! Good component factory configuration.

The component factory initialization is well-structured with clear comments explaining the rationale behind disabling resumable full refresh for concurrent sources.

240-249: Fix formatting in the error message.

There's a missing space in the error message, causing received{cursor.__class__} to be concatenated without a space.

Apply this diff to fix the formatting:
-                                f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received{cursor.__class__}"
+                                f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received {cursor.__class__}"
349-352: LGTM! Good type validation extension.

The addition of AsyncJobPartitionRouter to the type validation is well-integrated and maintains consistency with the new async support.

brianjlai

i added the various fixes noted which should have this in a good state to merge. Approving since the fixes I pushed were what I would have commented to address

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
250-251: ⚠️ Potential issue

Fix formatting in the error message

There's a missing space in the error message, causing received{cursor.__class__} to be concatenated without a space.

Apply this diff to fix the formatting:
- f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received{cursor.__class__}"
+ f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received {cursor.__class__}"

🧹 Nitpick comments (6)

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
3104-3116: Consider using a helper method for creating the state message.

The state message construction could be extracted into a helper method to improve readability and reusability across tests, wdyt?
def create_state_message(stream_name: str, stream_state: dict) -> AirbyteStateMessage:
    return AirbyteStateMessage(
        type=AirbyteStateType.STREAM,
        stream=AirbyteStreamState(
            stream_descriptor=StreamDescriptor(name=stream_name),
            stream_state=AirbyteStateBlob(stream_state),
        ),
    )
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5)

73-75: Consider improving the TODO comment.

The TODO comment suggests that state could be removed from initialization, but it's not clear about the potential edge cases. Would it be helpful to document these edge cases in the comment to help future maintainers understand why we might need to keep it? WDYT?

Line range hint 1681-1689: Consider adding more context to the comments.

The comments explain the current limitations well, but could we add more context about when these limitations might be lifted? For example, what specific concurrent cursor support or CDK implementation would enable these features? WDYT?

73-75: Consider removing the TODO comment

The comment suggests that state could be removed from initialization, but notes there might be edge cases. Since this is a public-facing change, should we either address the TODO or document the edge cases that prevent its removal? WDYT?

223-242: Verify the cursor nullification approach

The code nullifies the cursor after state initialization to avoid duplicate state management. While this works, it feels a bit hacky. Would it make more sense to refactor this to avoid the need for nullification? For example, could we introduce a flag to disable state management in the low-code framework instead? WDYT?

Line range hint 1681-1690: Consider adding more detailed error messages

The error messages for unsupported configurations could be more helpful. Would it be useful to include:

What cursor types are currently supported

When per partition support is expected

Links to relevant documentation
WDYT?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05ead34 and 92f0940.

📒 Files selected for processing (4)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (7 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

unit_tests/sources/declarative/test_concurrent_declarative_source.py

👮 Files not reviewed due to content moderation or server errors (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (10)

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

3118-3120: LGTM! Clear factory initialization with state management.

The factory initialization with state management is well-structured and the parameters are clear.
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (7)

37-37: LGTM! New imports look good.

The addition of AsyncJobPartitionRouter and AsyncRetriever imports aligns with the PR objectives to support async job low-code streams.

Also applies to: 39-39

223-242: LGTM! Smart optimization for cursor and state management.

The code optimizes state management by bypassing low-code cursor flows when using ConcurrentCursor. The comments effectively explain the rationale behind this optimization.

250-251: Fix formatting in the error message.

There's a missing space in the error message, causing received{cursor.__class__} to be concatenated without a space.

Apply this diff to fix the formatting:
- f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received{cursor.__class__}"
+ f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received {cursor.__class__}"
37-37: LGTM: New imports for async job support

The addition of AsyncJobPartitionRouter and AsyncRetriever imports aligns with the PR's objective of enabling async job low-code streams.

Also applies to: 39-39

405-408: LGTM: Extended stream slicer type check

The addition of AsyncJobPartitionRouter to the type check in _is_datetime_incremental_without_partition_routing is correct and aligns with the new functionality.

Line range hint 506-506: LGTM: Added connector state manager parameter

The addition of the optional connector_state_manager parameter with a default fallback to a new instance is a clean approach.

Also applies to: 518-518

Line range hint 936-937: LGTM: Simplified state retrieval

Using _connector_state_manager.get_stream_state directly is cleaner than passing state through parameters.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

506-506: LGTM! Good addition of connector_state_manager.

The addition of the optional connector_state_manager parameter with a sensible default improves the flexibility of the factory class.

Also applies to: 518-518

936-937: LGTM! Proper state retrieval.

The code correctly retrieves stream state using the connector_state_manager, which aligns with the state management improvements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (3)

73-75: Consider adding more detailed documentation for the state management change.

The TODO comment suggests that state initialization could be removed, but it might be worth documenting why we're keeping it for now and what edge cases we're concerned about, wdyt?

-        # todo: We could remove state from initialization. Now that streams are grouped during the read(), a source
-        #  no longer needs to store the original incoming state. But maybe there's an edge case?
+        # TODO: State initialization could potentially be removed since streams are grouped during read().
+        # However, we're keeping it for now to:
+        # 1. Handle potential edge cases with state management
+        # 2. Maintain backward compatibility
+        # 3. Support legacy Python streams that might depend on this behavior

465-467: Consider adding type hints for better code maintainability.

The _get_retriever method could benefit from more specific type hints for better code maintainability and IDE support, wdyt?

    @staticmethod
    def _get_retriever(
-        declarative_stream: DeclarativeStream, stream_state: Mapping[str, Any]
+        declarative_stream: DeclarativeStream,
+        stream_state: Mapping[str, Any],
    ) -> Retriever:

393-396: Consider extracting complex condition into a helper method.

The condition in _is_datetime_incremental_without_partition_routing is getting more complex with the addition of AsyncJobPartitionRouter. Would it be clearer to extract this into a helper method, wdyt?

+    def _is_valid_stream_slicer(self, stream_slicer: Any) -> bool:
+        return (
+            isinstance(stream_slicer, DatetimeBasedCursor)
+            or isinstance(stream_slicer, AsyncJobPartitionRouter)
+        )
+
     def _is_datetime_incremental_without_partition_routing(
         self,
         declarative_stream: DeclarativeStream,
         incremental_sync_component_definition: Mapping[str, Any] | None,
     ) -> bool:
         return (
             incremental_sync_component_definition is not None
             and bool(incremental_sync_component_definition)
             and incremental_sync_component_definition.get("type", "") == DatetimeBasedCursorModel.__name__
             and self._stream_supports_concurrent_partition_processing(declarative_stream=declarative_stream)
             and hasattr(declarative_stream.retriever, "stream_slicer")
-            and (
-                isinstance(declarative_stream.retriever.stream_slicer, DatetimeBasedCursor)
-                or isinstance(declarative_stream.retriever.stream_slicer, AsyncJobPartitionRouter)
-            )
+            and self._is_valid_stream_slicer(declarative_stream.retriever.stream_slicer)
         )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92f0940 and 03d1958.

📒 Files selected for processing (1)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (7 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

229-239: Fix the error message formatting.

There's a missing space in the error message, causing received{cursor.__class__} to be concatenated without a space.
- f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received{cursor.__class__}"
+ f"Expected AsyncJobPartitionRouter stream_slicer to be of type ConcurrentCursor, but received {cursor.__class__}"
223-225: Consider adding error handling for stream state retrieval.

The get_stream_state calls might benefit from error handling to gracefully handle cases where the stream state is not found or malformed, wdyt?

Also applies to: 330-332

aaronsteers · 2025-01-24T02:18:00Z

@maxi297 - FYI, I'm updating from main to resolve an issue in the Test Connectors workflow: ModuleNotFoundError: No module named 'pkg_resources'

maxi297 · 2025-01-24T15:24:27Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

Line range hint 510-522: Consider adding type hints and docstring updates?

The new connector_state_manager parameter could benefit from:

A docstring explaining its purpose and default behavior
Moving the type hint from the parameter to the instance variable

What do you think about adding something like this?

 def __init__(
     self,
     limit_pages_fetched_per_slice: Optional[int] = None,
     limit_slices_fetched: Optional[int] = None,
     emit_connector_builder_messages: bool = False,
     disable_retries: bool = False,
     disable_cache: bool = False,
     disable_resumable_full_refresh: bool = False,
     message_repository: Optional[MessageRepository] = None,
     connector_state_manager: Optional[ConnectorStateManager] = None,
 ):
+    """Initialize the ModelToComponentFactory.
+    
+    Args:
+        connector_state_manager: Optional manager for handling connector state.
+            If not provided, an empty state manager will be used.
+    """
     self._init_mappings()
     self._limit_pages_fetched_per_slice = limit_pages_fetched_per_slice
     self._limit_slices_fetched = limit_slices_fetched
     self._emit_connector_builder_messages = emit_connector_builder_messages
     self._disable_retries = disable_retries
     self._disable_cache = disable_cache
     self._disable_resumable_full_refresh = disable_resumable_full_refresh
     self._message_repository = message_repository or InMemoryMessageRepository(
         self._evaluate_log_level(emit_connector_builder_messages)
     )
-    self._connector_state_manager = connector_state_manager or ConnectorStateManager()
+    self._connector_state_manager: ConnectorStateManager = connector_state_manager or ConnectorStateManager()

1693-1708: Consider making the error messages more actionable?

The validation logic is good, but the error messages could be more helpful by suggesting what the user should do instead. What do you think about something like this?

-                    raise ValueError(
-                        "AsyncRetriever with cursor other than DatetimeBasedCursor is not supported yet"
-                    )
+                    raise ValueError(
+                        "AsyncRetriever currently only supports DatetimeBasedCursor. "
+                        f"Found cursor of type {type(model.incremental_sync).__name__}. "
+                        "Please use DatetimeBasedCursor or wait for support of additional cursor types."
+                    )
-                    raise ValueError("Per partition state is not supported yet for AsyncRetriever")
+                    raise ValueError(
+                        "AsyncRetriever does not support partition routers yet. "
+                        "Please remove the partition_router configuration or wait for the feature to be implemented."
+                    )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 516f844 and 8108134.

📒 Files selected for processing (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

🔇 Additional comments (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

941-948: LGTM! Clear comments and good state handling.

The logic for handling stream state is well-documented and correctly implemented. The comments explain the purpose of the state retrieval logic clearly.

TMP development to show how things could work with concurrent cursor

68cd03b

maxi297 commented Jan 17, 2025

View reviewed changes

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Outdated Show resolved Hide resolved

maxi297 commented Jan 17, 2025

View reviewed changes

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Outdated Show resolved Hide resolved

maxi297 commented Jan 17, 2025

View reviewed changes

airbyte_cdk/sources/types.py Outdated Show resolved Hide resolved

maxi297 commented Jan 17, 2025

View reviewed changes

unit_tests/sources/declarative/test_concurrent_declarative_source.py Show resolved Hide resolved

maxi297 and others added 3 commits January 22, 2025 11:29

Improve comments on async retriever cursor initialization

c3b9aed

making stream name optional when hashing slices

05af545

Merge branch 'main' into maxi297/asyncretriever-with-concurrent-cursor

fb4ba48

brianjlai reviewed Jan 22, 2025

View reviewed changes

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Outdated Show resolved Hide resolved

brianjlai added 2 commits January 22, 2025 16:00

Merge branch 'main' into maxi297/asyncretriever-with-concurrent-cursor

b1411f4

fix tests and clean up the code for readability and better typing, ad…

07493d3

…d some more tests to concurrent_declarative_source

brianjlai changed the title ~~TMP development to show how things could work with concurrent cursor~~ feat(low-code concurrent) Allow async job low-code streams that are incremental to be run by the concurrent framework Jan 23, 2025

brianjlai changed the title ~~feat(low-code concurrent) Allow async job low-code streams that are incremental to be run by the concurrent framework~~ feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework Jan 23, 2025

formatting

8a55d2c

github-actions bot added the enhancement New feature or request label Jan 23, 2025

brianjlai marked this pull request as ready for review January 23, 2025 01:18

brianjlai reviewed Jan 23, 2025

View reviewed changes

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Outdated Show resolved Hide resolved

coderabbitai bot requested changes Jan 23, 2025

View reviewed changes

airbyte_cdk/sources/declarative/concurrent_declarative_source.py Show resolved Hide resolved

Update airbyte_cdk/sources/declarative/concurrent_declarative_source.py

05ead34

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot approved these changes Jan 23, 2025

View reviewed changes

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

brianjlai approved these changes Jan 23, 2025

View reviewed changes

brianjlai added 2 commits January 23, 2025 13:58

Merge branch 'main' into maxi297/asyncretriever-with-concurrent-cursor

6e30a7e

fix test

92f0940

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

refactor back to old way

03d1958

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

Merge branch 'main' into maxi297/asyncretriever-with-concurrent-cursor

49690c7

Attempt to fix per partition concurrent tests

516f844

octavia-squidington-iii and others added 3 commits January 24, 2025 15:25

Auto-fix lint and format issues

f5c61c0

add clarifying comments

68fd8cb

Merge branch 'main' into maxi297/asyncretriever-with-concurrent-cursor

8108134

coderabbitai bot reviewed Jan 24, 2025

View reviewed changes

brianjlai merged commit c964574 into main Jan 24, 2025
21 of 23 checks passed

brianjlai deleted the maxi297/asyncretriever-with-concurrent-cursor branch January 24, 2025 20:56

brianjlai mentioned this pull request Jan 24, 2025

feat(AsyncRetriever): Allow for streams using AsyncRetriever and DatetimeBasedCursor to perform checkpointing #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

maxi297 commented Jan 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2025 •

edited

Loading

Walkthrough

Changes

Possibly related issues

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

brianjlai left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

aaronsteers commented Jan 24, 2025

maxi297 commented Jan 24, 2025 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

Conversation

maxi297 commented Jan 17, 2025 • edited by coderabbitai bot Loading

What

How

Summary by CodeRabbit

coderabbitai bot commented Jan 23, 2025 • edited Loading

Walkthrough

Changes

Possibly related issues

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

brianjlai left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

aaronsteers commented Jan 24, 2025

maxi297 commented Jan 24, 2025 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

maxi297 commented Jan 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

maxi297 commented Jan 24, 2025 •

edited by github-actions bot

Loading