feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor #111

tolik0 · 2024-12-04T11:35:39Z

Summary by CodeRabbit

New Features
- Added support for concurrent cursor processing in declarative sources.
- Enhanced incremental sync capabilities with per-partition and global cursor strategies.
- Introduced new methods for creating concurrent cursors based on partitioned data streams.
- Added a method to check if the timer is running in the Timer class.
Improvements
- Streamlined record filtering and cursor handling.
- Improved error handling for cursor and record management.
- Updated request option retrieval in retrievers.
Technical Updates
- Introduced new cursor classes for more flexible incremental synchronization.
- Expanded test coverage for concurrent and incremental stream processing.
- Updated tests to reflect changes in stream categorization and expected behaviors.
- Added detailed unit tests for low-code connector streams and hierarchical substreams.
- Enhanced the testing framework for ConcurrentDeclarativeSource and improved error handling in tests.

coderabbitai · 2024-12-04T11:39:05Z

📝 Walkthrough

Walkthrough

The pull request introduces significant enhancements to the Airbyte CDK's concurrent declarative source handling, focusing on improving incremental sync capabilities for low-code connectors. The changes primarily revolve around extending support for different cursor types, particularly PerPartitionWithGlobalCursor, and refining the way streams are processed concurrently. The modifications span multiple files, introducing new classes, methods, and updating existing logic to support more complex incremental synchronization scenarios.

Changes

File	Change Summary
`airbyte_cdk/sources/declarative/concurrent_declarative_source.py`	Added import for `PerPartitionWithGlobalCursor` and updated `_group_streams` method to handle new cursor type. New method `_get_retriever` introduced.
`airbyte_cdk/sources/declarative/incremental/__init__.py`	Added exports for `ConcurrentCursorFactory` and `ConcurrentPerPartitionCursor`.
`airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py`	Introduced new classes for managing concurrent cursors with advanced state tracking and several new methods for cursor management.
`airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py`	Added methods for creating concurrent cursors from different cursor types. Updated existing method to include additional parameters.
`airbyte_cdk/sources/declarative/retrievers/simple_retriever.py`	Updated request handling to use `request_option_provider`.
`airbyte_cdk/sources/declarative/extractors/record_filter.py`	Simplified cursor handling in `ClientSideIncrementalRecordFilterDecorator`.
`airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py`	Added `should_be_synced` method and enhanced record validation.
`unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py`	Expanded test suite for `ConcurrentDeclarativeSource`, adding tests for incremental sync scenarios.
`unit_tests/sources/declarative/extractors/test_record_filter.py`	Updated tests to reflect cursor parameter consolidation in `ClientSideIncrementalRecordFilterDecorator`.
`unit_tests/sources/declarative/parsers/test_model_to_component_factory.py`	Added tests for error handling in incremental sync configurations.
`unit_tests/sources/declarative/test_concurrent_declarative_source.py`	Adjusted tests for expected concurrent stream counts and classifications.
`unit_tests/sources/declarative/retrievers/test_simple_retriever.py`	Updated tests to reflect changes in request handling logic.
`airbyte_cdk/sources/streams/concurrent/cursor.py`	Modified state management in `ConcurrentCursor` class for better encapsulation.
`airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py`	Added `is_running` method to the `Timer` class.

Possibly related PRs

chore(refactor): refactor partition generator to take any stream slicer #39: Shares similar goals of refactoring stream partition generation.
fix(airbyte-cdk): Fix Record Filter Validation in ConcurrentDeclarativeSource #45: Connected through stream processing modifications.
fix(concurrent-cdk): Move the grouping of concurrent and synchronous streams into the read and discover commands instead of when initializing the source #130: Aligns with handling of stream management enhancements.

Suggested reviewers

Hey there! 👋 I noticed some interesting improvements in the concurrent declarative source handling. A few questions for you:

The changes introduce PerPartitionWithGlobalCursor support - wdyt about how this might impact existing low-code connectors?
I see you've consolidated some cursor handling logic. Do you think this makes the code more readable or potentially introduces any subtle complexity?
The new ConcurrentPerPartitionCursor seems quite powerful. Any specific use cases that inspired its development?

Would love to hear your thoughts! 🚀

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)

59-66: Consider simplifying the constructor parameters

The __init__ method of ConcurrentPerPartitionCursor has many parameters, which can make it harder to maintain and understand. Would it be beneficial to encapsulate related parameters into data classes or reduce the number of parameters if possible? Wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

23-25: Unused import statement

We import PerPartitionWithGlobalCursor, but it's not used elsewhere in the code. Should we remove this import to keep the code clean? Wdyt?

309-361: Refactor duplicated code in _group_streams method

There seems to be duplicated code in the conditional blocks handling DatetimeBasedCursorModel streams. Could we refactor these blocks into a helper function to reduce redundancy and improve readability? Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

915-970: Consider reducing code duplication

The methods create_concurrent_cursor_from_datetime_based_cursor and create_concurrent_cursor_from_perpartition_cursor share similar logic. Could we extract the common parts into a shared helper function to simplify maintenance? Wdyt?
airbyte_cdk/sources/streams/concurrent/cursor.py (1)
243-251: Consider extracting common logic between close methods?

The new close_partition_without_emit shares a lot of logic with close_partition. What do you think about extracting the common logic into a private method to reduce duplication? Something like this, wdyt?
+    def _close_partition_internal(self, partition: Partition) -> bool:
+        slice_count_before = len(self.state.get("slices", []))
+        self._add_slice_to_state(partition)
+        should_merge = slice_count_before < len(self.state["slices"])
+        if should_merge:
+            self._merge_partitions()
+        self._has_closed_at_least_one_slice = True
+        return should_merge
+
     def close_partition_without_emit(self, partition: Partition) -> None:
-        slice_count_before = len(self.state.get("slices", []))
-        self._add_slice_to_state(partition)
-        if slice_count_before < len(
-            self.state["slices"]
-        ):  # only emit if at least one slice has been processed
-            self._merge_partitions()
-        self._has_closed_at_least_one_slice = True
+        self._close_partition_internal(partition)

     def close_partition(self, partition: Partition) -> None:
-        slice_count_before = len(self.state.get("slices", []))
-        self._add_slice_to_state(partition)
-        if slice_count_before < len(
-            self.state["slices"]
-        ):  # only emit if at least one slice has been processed
-            self._merge_partitions()
+        if self._close_partition_internal(partition):
             self._emit_state_message()
-        self._has_closed_at_least_one_slice = True

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between acb6630 and a3304b9.

📒 Files selected for processing (7)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2 hunks)
airbyte_cdk/sources/declarative/incremental/__init__.py (2 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py (1 hunks)
airbyte_cdk/sources/streams/concurrent/cursor.py (1 hunks)

🔇 Additional comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2)

25-29: Question about returning only the first cursor

In the ConcurrentCursorFactory, the create method returns only the first cursor from _create_function. Since _create_function returns a tuple of cursors, should we ensure that we're not discarding any necessary cursors? Perhaps we should handle all cursors returned. Wdyt?

221-224: Ensure partition key serialization handles edge cases

In _to_partition_key and _to_dict, we use self._partition_serializer. Do we need to ensure that partition keys are properly sanitized or encoded to handle special characters and prevent serialization issues? Wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

317-318: Type checking with isinstance

We use isinstance(declarative_stream.retriever.stream_slicer, PerPartitionWithGlobalCursor) to check the type. Should we consider using duck typing or interface checks to allow for subclasses or alternative implementations that fulfill the same protocol? Wdyt?

airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py (1)

41-66: Clarify the purpose of DeclarativePartitionFactory1

The new class DeclarativePartitionFactory1 appears to be similar to DeclarativePartitionFactory but with a different approach to handling the retriever. Is this intended to replace the original factory or serve a different purpose? Providing more context or renaming the class for clarity might help. Wdyt?

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

181-181: LGTM!

The change to use request_option_provider instead of stream_slicer improves modularity by separating concerns.

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py

airbyte_cdk/sources/declarative/concurrent_declarative_source.py

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

airbyte_cdk/sources/declarative/incremental/__init__.py

airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)

152-156: Could we simplify the partition limit condition in _ensure_partition_limit method?

Currently, the condition is len(self._cursor_per_partition) > self.DEFAULT_MAX_PARTITIONS_NUMBER - 1. Would it improve readability to use len(self._cursor_per_partition) >= self.DEFAULT_MAX_PARTITIONS_NUMBER instead? This change might make the maximum partition limit clearer. Wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
910-965: Consider extracting common validation logic and fixing indentation.

A few suggestions to improve this new method:

The component type validation logic (lines 922-926) is duplicated from create_concurrent_cursor_from_datetime_based_cursor. Should we extract it into a helper method to follow DRY?

The indentation in the return statement (lines 955-964) seems inconsistent with the rest of the file. Should we align it with the standard indentation?

Instead of using type ignore on line 961, we could properly type message_repository. Wdyt?
     def create_concurrent_cursor_from_perpartition_cursor(
         self,
         state_manager: ConnectorStateManager,
         model_type: Type[BaseModel],
         component_definition: ComponentDefinition,
         stream_name: str,
         stream_namespace: Optional[str],
         config: Config,
         stream_state: MutableMapping[str, Any],
         partition_router,
             **kwargs: Any,
     ) -> ConcurrentPerPartitionCursor:
-        component_type = component_definition.get("type")
-        if component_definition.get("type") != model_type.__name__:
-            raise ValueError(
-                f"Expected manifest component of type {model_type.__name__}, but received {component_type} instead"
-            )
+        self._validate_component_type(component_definition, model_type)

         # ... rest of the method ...

         return ConcurrentPerPartitionCursor(
-                cursor_factory=cursor_factory,
-                partition_router=partition_router,
-                stream_name=stream_name,
-                stream_namespace=stream_namespace,
-                stream_state=stream_state,
-                message_repository=self._message_repository,  # type: ignore
-                connector_state_manager=state_manager,
-                cursor_field=cursor_field,
-            )
+            cursor_factory=cursor_factory,
+            partition_router=partition_router,
+            stream_name=stream_name,
+            stream_namespace=stream_namespace,
+            stream_state=stream_state,
+            message_repository=self._message_repository,
+            connector_state_manager=state_manager,
+            cursor_field=cursor_field,
+        )
Helper method to add:
def _validate_component_type(self, component_definition: ComponentDefinition, expected_type: Type[BaseModel]) -> None:
    component_type = component_definition.get("type")
    if component_definition.get("type") != expected_type.__name__:
        raise ValueError(
            f"Expected manifest component of type {expected_type.__name__}, but received {component_type} instead"
        )
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (3)

276-276: Should the test case name match the function name for consistency?

In the parameterization at line 276, the test name is "test_incremental_parent_state", but the test function is named test_incremental_parent_state_no_incremental_dependency. Aligning the test case name with the function name could enhance clarity. Wdyt?

483-495: Could we improve the docstring formatting for better readability?

The docstring contains a detailed explanation of the test, but it's indented inconsistently. Adjusting the indentation could enhance readability and maintain consistency with PEP 257 guidelines. Wdyt?

519-523: Is comparing only the last state sufficient to validate the final state?

In the assertion at lines 519-523, we compare final_state[-1] with expected_state. Should we consider comparing all elements of final_state to ensure that all state messages throughout the sync match the expectations? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4ddbb84 and dfcf17f.

📒 Files selected for processing (7)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2 hunks)
airbyte_cdk/sources/declarative/extractors/record_filter.py (2 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
airbyte_cdk/sources/streams/concurrent/cursor.py (1 hunks)
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/extractors/record_filter.py (1)

78-81: Is it safe to use an empty stream_name when creating Record instances?

In the filter_records method, we're creating Record instances with an empty stream_name. Could this lead to issues if other parts of the codebase expect stream_name to be non-empty? Should we consider passing the actual stream name to prevent potential problems? Wdyt?

airbyte_cdk/sources/streams/concurrent/cursor.py (1)

243-251: Could we clarify the behavior of close_partition_without_emit method?

In the close_partition_without_emit method, we update _has_closed_at_least_one_slice to True, but we don't emit a state message. Is this intended? Should we ensure that the state is eventually emitted elsewhere to keep the state consistent? Wdyt?

airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py

unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py

tolik0 · 2024-12-06T13:49:31Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

❌ Job failed.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

unit_tests/sources/declarative/extractors/test_record_filter.py (1)

Line range hint 341-432: Comprehensive test coverage for cursor types! 👍

The parameterized tests cover all important cursor scenarios:

Basic datetime cursor

Global substream cursor (with and without state)

Per-partition cursor with various state configurations

One thought: would it be valuable to add a test case for error handling when an invalid cursor type is passed? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dfcf17f and b84e68a.

📒 Files selected for processing (1)

unit_tests/sources/declarative/extractors/test_record_filter.py (2 hunks)

🔇 Additional comments (1)

unit_tests/sources/declarative/extractors/test_record_filter.py (1)

293-293: Nice API simplification! 🎉

I see you've consolidated the cursor parameters (date_time_based_cursor and substream_cursor) into a single cursor parameter. This makes the API cleaner and more flexible. The tests demonstrate it works well with all cursor types (DatetimeBasedCursor, GlobalSubstreamCursor, and PerPartitionWithGlobalCursor).

Also applies to: 431-431

tolik0 · 2024-12-06T13:58:54Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

305-316: Consider extracting condition to a descriptive method

The nested conditions make the code harder to read and maintain. Would you consider extracting these conditions into a separate method with a descriptive name? Wdyt?

Something like:

+def _supports_concurrent_perpartition_cursor(
+    self,
+    declarative_stream: DeclarativeStream,
+    incremental_sync_component_definition: Mapping[str, Any]
+) -> bool:
+    return (
+        incremental_sync_component_definition
+        and incremental_sync_component_definition.get("type", "") == DatetimeBasedCursorModel.__name__
+        and self._stream_supports_concurrent_partition_processing(declarative_stream=declarative_stream)
+        and hasattr(declarative_stream.retriever, "stream_slicer")
+        and isinstance(declarative_stream.retriever.stream_slicer, PerPartitionWithGlobalCursor)
+    )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b84e68a and a6b5daa.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py

🔇 Additional comments (4)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

84-85: ⚠️ Potential issue

Add missing comma between imports

There's a missing comma between ConcurrentCursorFactory and ConcurrentPerPartitionCursor in the import statement.

Apply this diff:

-    ConcurrentCursorFactory
-    ConcurrentPerPartitionCursor,
+    ConcurrentCursorFactory,
+    ConcurrentPerPartitionCursor,

airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (2)

Line range hint 320-358: LGTM! Well-structured comparison logic

The implementation includes proper null checks and validation for partition matching. The error messages are clear and descriptive.

306-319: ⚠️ Potential issue

Add null check for record.associated_slice

The code accesses record.associated_slice.partition without verifying if record.associated_slice is not None. This could lead to a NullPointerException.

Consider adding a null check like this:

def should_be_synced(self, record: Record) -> bool:
+   if not record.associated_slice:
+       raise ValueError("Record must have an associated slice")
    if (
        self._to_partition_key(record.associated_slice.partition)
        not in self._cursor_per_partition
    ):

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

317-331: Consider handling potential None stream_state

When retrieving stream_state, we might want to handle the case where it could be None to prevent unexpected errors during cursor creation. Wdyt?

Let's verify the current handling of stream_state:

#!/bin/bash
# Check how stream_state is handled in other parts of the codebase
rg -A 3 "get_stream_state" | grep -v "test"

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

tolik0 · 2024-12-06T15:54:06Z

Regression tests: https://github.com/airbytehq/airbyte/actions/runs/12201672892/job/34040590334

maxi297

I'm very happy with the progress on this! I've added a couple of comments more in terms of code maintenance and structure but the functional part seemed fine. I want to check the tests eventually (tomorrow hopefully) but I can still leave a couple of comments here

airbyte_cdk/sources/declarative/concurrent_declarative_source.py

airbyte_cdk/sources/declarative/extractors/record_filter.py

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py

airbyte_cdk/sources/streams/concurrent/cursor.py

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py

tolik0 · 2024-12-18T16:18:48Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)
306-319: ⚠️ Potential issue

Should we add a null check for record.associated_slice?

The code accesses record.associated_slice.partition without verifying if record.associated_slice is not None. This could raise an AttributeError. Consider adding a null check, wdyt?
 def should_be_synced(self, record: Record) -> bool:
+    if not record.associated_slice:
+        raise ValueError("Record must have an associated slice")
     if (
         self._to_partition_key(record.associated_slice.partition)
         not in self._cursor_per_partition
     ):
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (1)
244-268: ⚠️ Potential issue

Should we pass the state parameter to the read method?

The state parameter is passed to ConcurrentDeclarativeSource initialization but not to the read method (empty list is passed instead). This might affect state handling. Consider using the state parameter in both places, wdyt?
-    messages = list(source.read(logger=source.logger, config=config, catalog=catalog, state=[]))
+    messages = list(source.read(logger=source.logger, config=config, catalog=catalog, state=state))

🧹 Nitpick comments (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

962-1018: Consider adding error handling for cursor creation

The cursor creation logic could benefit from additional error handling similar to create_concurrent_cursor_from_datetime_based_cursor. For instance, what happens if the cursor factory creation fails? Wdyt?

    def create_concurrent_cursor_from_perpartition_cursor(
        self,
        state_manager: ConnectorStateManager,
        model_type: Type[BaseModel],
        component_definition: ComponentDefinition,
        stream_name: str,
        stream_namespace: Optional[str],
        config: Config,
        stream_state: MutableMapping[str, Any],
        partition_router,
        **kwargs: Any,
    ) -> ConcurrentPerPartitionCursor:
+       try:
            component_type = component_definition.get("type")
            if component_definition.get("type") != model_type.__name__:
                raise ValueError(
                    f"Expected manifest component of type {model_type.__name__}, but received {component_type} instead"
                )

            # Rest of the implementation...

            return ConcurrentPerPartitionCursor(
                cursor_factory=cursor_factory,
                partition_router=partition_router,
                stream_name=stream_name,
                stream_namespace=stream_namespace,
                stream_state=stream_state,
                message_repository=self._message_repository,
                connector_state_manager=state_manager,
                cursor_field=cursor_field,
            )
+       except Exception as e:
+           raise ValueError(
+               f"Failed to create concurrent cursor for stream '{stream_name}': {str(e)}"
+           ) from e

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b5daa and a36726b.

📒 Files selected for processing (6)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (7 hunks)
airbyte_cdk/sources/streams/concurrent/cursor.py (1 hunks)
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

airbyte_cdk/sources/streams/concurrent/cursor.py
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py

🔇 Additional comments (5)

unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (3)

482-528: LGTM! Well-structured test with clear documentation.

The test case is comprehensive and effectively validates the parent state behavior when incremental dependency is disabled.

530-630: LGTM! Excellent test helper implementation.

The helper function is well-documented and implements thorough validation steps for testing incremental parent state behavior.

1124-1556: LGTM! Comprehensive test coverage.

The test functions thoroughly cover different scenarios including state migration and empty record sets. The test cases are well-structured and documented.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

Line range hint 814-961: LGTM! Well-structured implementation with proper error handling.

The implementation of create_concurrent_cursor_from_datetime_based_cursor is thorough and handles all edge cases appropriately.

84-85: ⚠️ Potential issue

Add missing comma between imports

There seems to be a missing comma between ConcurrentCursorFactory and ConcurrentPerPartitionCursor. Wdyt?

-    ConcurrentCursorFactory
-    ConcurrentPerPartitionCursor,
+    ConcurrentCursorFactory,
+    ConcurrentPerPartitionCursor,

Likely invalid or redundant comment.

airbyte_cdk/sources/declarative/concurrent_declarative_source.py

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (5)
31-32: Consider adding more specific type hints for create_function.

The create_function parameter could benefit from more specific type hints. Would you consider updating it to explicitly show the expected parameter types? Wdyt?
-    def __init__(self, create_function: Callable[..., ConcurrentCursor]):
+    def __init__(
+        self,
+        create_function: Callable[[Mapping[str, Any], Optional[Any]], ConcurrentCursor]
+    ):
42-54: Consider enhancing documentation with thread safety details.

The class uses locks and semaphores, but the documentation doesn't mention thread safety considerations. Would you like to add a section about thread safety and concurrent access patterns? Wdyt?

57-58: Consider using frozen dictionaries for constants.

The _NO_STATE and _NO_CURSOR_STATE are mutable mappings. To prevent accidental modifications, would you consider using MappingProxyType or frozendict? Wdyt?
-    _NO_STATE: Mapping[str, Any] = {}
-    _NO_CURSOR_STATE: Mapping[str, Any] = {}
+    _NO_STATE: Mapping[str, Any] = MappingProxyType({})
+    _NO_CURSOR_STATE: Mapping[str, Any] = MappingProxyType({})
102-121: Consider optimizing state copying strategy.

The state property performs deep copies of cursor states for each partition. For better performance with large states, would you consider:

Using shallow copies where safe

Implementing copy-on-write

Caching the state if it hasn't changed

Wdyt?

306-314: Enhance error messages for better debugging.

The error messages could be more specific about what caused the invalid state. Would you consider including more context in the error messages? Wdyt?
-                "Invalid state as stream slices that are emitted should refer to an existing cursor"
+                f"Invalid state: Partition key '{partition_key}' not found in cursor_per_partition. This might indicate a race condition or premature record emission."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9574f8c and 36c4992.

📒 Files selected for processing (1)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (1)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)

216-218: ⚠️ Potential issue

Review partition limit logic.

The limit_reached method returns true when _over_limit > DEFAULT_MAX_PARTITIONS_NUMBER, but _over_limit is incremented each time we remove a partition. This means we'll only return true after exceeding the limit by the full amount. Should this be >= instead? Wdyt?
-        return self._over_limit > self.DEFAULT_MAX_PARTITIONS_NUMBER
+        return self._over_limit >= self.DEFAULT_MAX_PARTITIONS_NUMBER
Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (3)
64-98: Consider adding type hints for class variables.

The initialization is well-structured, but adding type hints for class variables would improve code clarity and help catch potential type-related issues early. Wdyt about adding them like this?
-    _cursor_per_partition: OrderedDict[str, ConcurrentCursor] = OrderedDict()
-    _semaphore_per_partition: OrderedDict[str, threading.Semaphore] = OrderedDict()
-    _finished_partitions: set[str] = set()
-    _global_cursor: Optional[StreamState] = {}
-    _new_global_cursor: Optional[StreamState] = None
-    _lookback_window: int = 0
-    _parent_state: Optional[StreamState] = None
-    _over_limit: int = 0
104-123: Consider optimizing state generation for large number of partitions.

The state property performs deep copies of cursor states for each partition. This could become a performance bottleneck with many partitions. Would it make sense to optimize this by:

Processing states incrementally?

Using shallow copies where possible?

Implementing a caching mechanism?

Wdyt?

224-291: Consider adding error handling for invalid state format.

The _set_initial_state method has good documentation but could benefit from explicit error handling for invalid state formats. For example:
     if not stream_state:
         return

+    if not isinstance(stream_state, dict):
+        raise ValueError(f"Invalid state format. Expected dict, got {type(stream_state)}")
+
     if self._PERPARTITION_STATE_KEY not in stream_state:
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
296-349: Consider extracting complex condition to a method.

The condition for checking if a stream supports concurrent partition processing with global cursor is quite complex. Would it make sense to extract it to a separate method for better readability? Something like:
+    def _supports_concurrent_partition_with_global_cursor(
+        self,
+        declarative_stream: DeclarativeStream,
+        incremental_sync_component_definition: Mapping[str, Any],
+    ) -> bool:
+        return (
+            incremental_sync_component_definition
+            and incremental_sync_component_definition.get("type", "")
+            == DatetimeBasedCursorModel.__name__
+            and self._stream_supports_concurrent_partition_processing(
+                declarative_stream=declarative_stream
+            )
+            and hasattr(declarative_stream.retriever, "stream_slicer")
+            and isinstance(
+                declarative_stream.retriever.stream_slicer, PerPartitionWithGlobalCursor
+            )
+        )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36c4992 and daa6873.

📒 Files selected for processing (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

🔇 Additional comments (5)

airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (1)

67-69: LGTM! Simple and effective implementation.

The is_running method provides a clean way to check the timer's state. The implementation is thread-safe and follows the single responsibility principle.

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (3)

30-39: LGTM! Clean factory implementation.

The factory pattern is well implemented with a clear separation of concerns. The create method signature is explicit about its parameters and return type.

208-220: Verify partition limit logic.

In _ensure_partition_limit, we increment _over_limit every time we remove a partition when the limit is reached. However, this could lead to edge cases where partitions are unnecessarily dropped. Should we consider:

Resetting _over_limit when partitions are successfully processed?

Using a different metric for tracking over-limit conditions?

Wdyt?

221-223: Potential off-by-one error in limit check.

The limit_reached method returns True when _over_limit > DEFAULT_MAX_PARTITIONS_NUMBER. Should this be >= instead? The current logic means we need to exceed the limit by more than one partition before considering it reached. Wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

440-459: LGTM! Well-documented optimization.

The _get_retriever method contains important optimizations for cursor and state management. The comments clearly explain the rationale behind the changes.

tolik0 · 2025-01-17T15:19:04Z

Regression tests for Jira: GitHub Action

Failure: test_all_records_are_the_same_with_state

Issues:

project_avatars: projectId is added by a transformation, so id does not uniquely identify the record.
screen_tab_fields: Similar to project_avatars, screenId and tabId are added by a transformation, so id does not uniquely identify the record.

Second Jira run - no errors
Third Jira run

Regression tests for Snapchat-marketing - 1, 2.
The error with delivery status is because it is a list with elements in random order; I got the same result locally.

Regression tests for harvest - 1, 2.

Regression tests for Klaviyo: Github Action

Regression tests for Chargebee (no incremental substreams): GitHub Action

Failure: The contact stream read test failed due to its update to concurrency. It no longer uses resumable full refresh state, which resulted in an increased number of records.

Catalog Changes:
4 streams were updated: attached_item, contact, quote_line_group, and subscription_with_scheduled_changes. These are full refresh streams with parent streams.

maxi297

When you say It no longer uses resumable full refresh state, which resulted in an increased number of records. for source-chargebee contact stream, it is when executed with a state value as an input, correct?

If the above is true, I'm all good with this change. Thanks a lot for the due diligence on this one.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2)

139-141: 🛠️ Refactor suggestion

Avoid accessing private attributes of Semaphore

In this section, the code accesses the private attribute _value of threading.Semaphore, which is intended to be internal to the class. Direct access to private attributes may lead to maintenance issues and unexpected behavior. Could we use an alternative approach to check the semaphore's state without accessing _value? For instance, we might track the semaphore count separately or restructure the logic to avoid the need for this check. Wdyt?

154-156: 🛠️ Refactor suggestion

Avoid accessing private attributes of Semaphore

Similar to a previous comment, accessing the _value attribute of threading.Semaphore directly is not recommended as it is meant to be private. Perhaps we could find another way to determine if all semaphores have a value of zero without accessing private attributes. This could involve maintaining a separate counter or using higher-level methods provided by the semaphore. Wdyt?

🧹 Nitpick comments (3)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2)

93-94: Consider typing _lookback_window as Optional[int]

Currently, _lookback_window is typed as int and initialized to 0, but elsewhere in the code, it is compared against None, implying that it can be None. Perhaps updating the type annotation to Optional[int] and initializing it to None would improve clarity and prevent potential type inconsistencies. Wdyt?

213-220: Reevaluate the necessity of the _over_limit variable

The _over_limit variable is incremented whenever the partition limit is exceeded but appears to be used only in the log message. If it's not contributing to the functionality beyond logging, perhaps we could remove it or consider logging the count of dropped partitions directly. This might simplify the code and reduce unnecessary state management. Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

944-955: Consider copying stream_state before modification

In this segment, stream_state is modified in place when applying the runtime_lookback_window. Modifying input parameters directly can lead to unintended side effects elsewhere in the code if stream_state is used after this function call. Perhaps creating a copy of stream_state before applying the modification would prevent potential issues. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daa6873 and c827d82.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1026-1026: Verify the fallback for message_repository

Here, message_repository defaults to self._message_repository if not provided. Could there be cases where self._message_repository is not initialized, leading to potential AttributeErrors? Perhaps adding a check to ensure self._message_repository is available or providing a sensible default would enhance robustness. Wdyt?

2201-2204: Simplify the type checking condition

The current condition for checking the type of stream_slicer is a bit complex and might not handle subclasses correctly:
if (
    not isinstance(stream_slicer, DatetimeBasedCursor)
    or type(stream_slicer) is not DatetimeBasedCursor
) and not isinstance(stream_slicer, PerPartitionWithGlobalCursor):
Would it be clearer and more robust to simplify the condition using isinstance with a tuple of allowed types?
if not isinstance(stream_slicer, (DatetimeBasedCursor, PerPartitionWithGlobalCursor)):
This approach ensures that any subclass of the allowed types is also accepted. Wdyt?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)

66-100: Consider documenting thread safety expectations.

The class uses threading primitives like Lock and Semaphore, but the thread safety expectations and guarantees aren't documented. Would it be helpful to add a section in the docstring about thread safety considerations and any assumptions made? This could help future maintainers understand the concurrency model better.

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

441-460: Consider enhancing method documentation.

The _get_retriever method contains important optimizations for cursor and state management. Would it be helpful to add more detailed docstring explaining:

The purpose of zeroing out the cursor

Why state management is handled through ConcurrentCursor

The implications for components like StopConditionPaginationStrategyDecorator
Wdyt?

297-350: Consider extracting cursor creation logic.

The stream grouping logic contains complex nested conditions for cursor creation. Would it make sense to extract this into a separate method like _create_perpartition_cursor to improve readability and maintainability? This would also make the logic easier to test. Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

990-1000: Consider adding error handling for state parsing.

When adjusting the stream state with runtime lookback window, we parse and format timestamps without explicit error handling. Should we add try-catch blocks to handle potential parsing errors and provide meaningful error messages? Wdyt?

1084-1139: Document NoopMessageRepository usage.

The method uses NoopMessageRepository when creating the cursor factory. Would it be helpful to add a comment explaining why we use a no-op repository here instead of passing through the original message repository? This design choice might not be immediately obvious to other developers. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c827d82 and 471ff7e.

📒 Files selected for processing (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5 hunks)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (6)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (6)

30-40: LGTM! Clean factory implementation.

The factory pattern is well implemented with clear separation of concerns.

106-125: Potential performance issue when generating state.

The state property method iterates over self._cursor_per_partition.items() and performs deep copies of cursor states. This could become a performance bottleneck with a large number of partitions. Should we consider optimizing this by avoiding deep copies or processing states incrementally? Wdyt?

127-150: Avoid accessing private attributes like _value of Semaphore.

In the check if self._semaphore_per_partition[partition_key]._value == 0, we're accessing the internal _value attribute of threading.Semaphore, which is intended to be private. Should we consider an alternative approach to track the semaphore state without accessing private attributes? Wdyt?

175-182: LGTM! Good error handling for timer state.

The check for already running timer prevents potential issues with multiple executions.

210-222: Clarification on partition limit logic.

In _ensure_partition_limit, we increment _over_limit every time we remove a partition when the limit is reached. However, in limit_reached, we check if _over_limit > DEFAULT_MAX_PARTITIONS_NUMBER. Is this the intended behavior? Should the condition be adjusted to properly reflect when the limit is truly exceeded? Wdyt?

295-303: Should we check if record.associated_slice is None before accessing partition?

In the observe method, we access record.associated_slice.partition, but if record.associated_slice is None, this could raise an error. Would it make sense to include a check to handle cases where associated_slice might be None? Wdyt?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (3)
89-91: Should we add type annotations to these instance variables?

Adding type hints would improve code clarity and help catch potential type-related issues early:
-        self._cursor_per_partition: OrderedDict[str, ConcurrentCursor] = OrderedDict()
-        self._semaphore_per_partition: OrderedDict[str, threading.Semaphore] = OrderedDict()
-        self._finished_partitions: set[str] = set()
+        self._cursor_per_partition: OrderedDict[str, ConcurrentCursor] = OrderedDict()
+        self._semaphore_per_partition: OrderedDict[str, threading.Semaphore] = OrderedDict()
+        self._finished_partitions: set[str] = set()
107-125: Consider optimizing state generation to avoid deep copies.

The state property performs deep copies of cursor states for each partition. For large numbers of partitions, this could become a performance bottleneck. Should we consider:

Using shallow copies where possible?

Implementing incremental state updates?

Caching the state if it hasn't changed?

Wdyt?

211-223: Should we clarify the partition limit logic?

In _ensure_partition_limit, we increment _over_limit when removing partitions that exceed the limit. However, this counter seems to be used differently in other parts of the code. Would it be helpful to:

Add a comment explaining the purpose of _over_limit?

Consider renaming it to better reflect its role (e.g., _dropped_partitions_count)?

Wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
999-1010: Consider adding error handling for state adjustment.

When adjusting the stream state with runtime lookback window, we should consider:

Adding validation for negative lookback windows

Handling potential parsing errors from parse_timestamp

Adding logging for state adjustments

Wdyt?

2377-2377: Should we simplify the type checking condition?

The current condition:
not isinstance(stream_slicer, DatetimeBasedCursor) or type(stream_slicer) is not DatetimeBasedCursor
Could be simplified to:
not isinstance(stream_slicer, (DatetimeBasedCursor, PerPartitionWithGlobalCursor))
This would be more maintainable and handle subclasses correctly. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 471ff7e and 19fe9c4.

📒 Files selected for processing (4)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8 hunks)
airbyte_cdk/sources/streams/concurrent/cursor.py (7 hunks)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

airbyte_cdk/sources/streams/concurrent/cursor.py
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (2)

airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)

42-56: LGTM! Well-documented class with clear docstrings.

The class documentation clearly explains the purpose, attributes, and key concepts of partition limitation logic and global cursor fallback. The docstring follows best practices with a clear structure.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

932-933: LGTM! Good addition of optional parameters.

The addition of message_repository and runtime_lookback_window parameters enhances flexibility in cursor creation.

* remotes/airbyte/main: fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)

tolik0 added 3 commits December 3, 2024 16:24

Add concurrent PerPartitionCursor

d326a26

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

37efbae

Use request options provider for ConcurrentPerPartitionCursor

a3304b9

github-actions bot added the enhancement New feature or request label Dec 4, 2024

Delete unused DeclarativePartitionFactory

4ddbb84

coderabbitai bot requested changes Dec 4, 2024

View reviewed changes

tolik0 added 2 commits December 5, 2024 15:17

Fixed record filter

41b029d

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

eb8eec8

coderabbitai bot requested changes Dec 5, 2024

View reviewed changes

airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py Outdated Show resolved Hide resolved

unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py Outdated Show resolved Hide resolved

Add unit test

dfcf17f

tolik0 temporarily deployed to DockerHub December 5, 2024 16:14 — with GitHub Actions Inactive

tolik0 temporarily deployed to PyPi December 5, 2024 16:14 — with GitHub Actions Inactive

Fix record filter unit tests

b84e68a

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

tolik0 added 3 commits December 6, 2024 15:51

Update poetry lock

2038075

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

c77b9a2

Update poetry lock again

c59ed5a

coderabbitai bot requested changes Dec 6, 2024

View reviewed changes

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Outdated Show resolved Hide resolved

Fix mypy error

a01c0b5

maxi297 reviewed Dec 9, 2024

View reviewed changes

tolik0 added 2 commits December 18, 2024 18:15

Add global cursor with fallback

357a925

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

79ffb77

tolik0 force-pushed the tolik0/concurrent-perpartitioncursor branch from a6b5daa to 79ffb77 Compare December 18, 2024 16:18

Auto-fix lint and format issues

a36726b

coderabbitai bot requested changes Dec 18, 2024

View reviewed changes

airbyte_cdk/sources/declarative/concurrent_declarative_source.py Outdated Show resolved Hide resolved

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

tolik0 added 4 commits January 16, 2025 19:13

Refactor to add helper to get retriever

b6707ef

Refactor to add helper to get retriever

cf5107f

Add class variable for state keys

df0993e

Add exception if stream_slices was executed two times

daa6873

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

tolik0 temporarily deployed to PyPi January 17, 2025 09:42 — with GitHub Actions Inactive

tolik0 temporarily deployed to DockerHub January 17, 2025 09:42 — with GitHub Actions Inactive

maxi297 mentioned this pull request Jan 17, 2025

feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework #228

Merged

maxi297 approved these changes Jan 22, 2025

View reviewed changes

Fix issues with error handling, refactor tests

c827d82

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

tolik0 added 2 commits January 22, 2025 17:57

Add comments for state format

d4d52b9

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

471ff7e

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

coderabbitai bot approved these changes Jan 22, 2025

View reviewed changes

tolik0 temporarily deployed to PyPi January 22, 2025 16:20 — with GitHub Actions Inactive

tolik0 temporarily deployed to DockerHub January 22, 2025 16:20 — with GitHub Actions Inactive

tolik0 added 3 commits January 22, 2025 21:34

Add emitting state after closing every partition

0c7c4de

Add reqeust validation to unit tests

6277e10

Merge branch 'main' into tolik0/concurrent-perpartitioncursor

19fe9c4

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

tolik0 temporarily deployed to PyPi January 22, 2025 19:47 — with GitHub Actions Inactive

tolik0 temporarily deployed to DockerHub January 22, 2025 19:47 — with GitHub Actions Inactive

tolik0 merged commit 4459243 into main Jan 23, 2025
19 of 23 checks passed

tolik0 deleted the tolik0/concurrent-perpartitioncursor branch January 23, 2025 08:53

coderabbitai bot mentioned this pull request Jan 23, 2025

fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor #254

Merged

coderabbitai bot mentioned this pull request Jan 24, 2025

fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter #267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor #111

feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor #111

tolik0 commented Dec 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

tolik0 commented Dec 6, 2024 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

tolik0 commented Dec 6, 2024 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

tolik0 commented Dec 6, 2024

maxi297 left a comment

tolik0 commented Dec 18, 2024 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

tolik0 commented Jan 17, 2025 •

edited

Loading

maxi297 left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor #111

feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor #111

Conversation

tolik0 commented Dec 4, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 4, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tolik0 commented Dec 6, 2024 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

tolik0 commented Dec 6, 2024 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

tolik0 commented Dec 6, 2024

maxi297 left a comment

Choose a reason for hiding this comment

tolik0 commented Dec 18, 2024 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tolik0 commented Jan 17, 2025 • edited Loading

maxi297 left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tolik0 commented Dec 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

tolik0 commented Dec 6, 2024 •

edited by github-actions bot

Loading

tolik0 commented Dec 6, 2024 •

edited by github-actions bot

Loading

tolik0 commented Dec 18, 2024 •

edited by github-actions bot

Loading

tolik0 commented Jan 17, 2025 •

edited

Loading