Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lint to CI (Ruff & Black) #63

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add Lint to CI (Ruff & Black) #63

wants to merge 6 commits into from

Conversation

gkorland
Copy link
Contributor

@gkorland gkorland commented Jan 11, 2025

Summary by CodeRabbit

  • Chores
    • Enhanced GitHub Actions workflow with static code analysis using Ruff and code formatting checks using Black.
    • Added code quality checks for SDK, examples, and test directories.
  • Bug Fixes
    • Improved error handling and logging for JSON extraction in ontology creation and data extraction steps.
  • Updates
    • Updated constructors of CSV and JSONL classes to include a new parameter for rows per document.
    • Introduced new static methods for JSON serialization and deserialization in KnowledgeGraphModelConfig.
    • Added a new test class for Knowledge Graph functionality.

Copy link

coderabbitai bot commented Jan 11, 2025

Warning

Rate limit exceeded

@gkorland has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 06d27bf and 7b90b9e.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (32)
  • .github/workflows/test.yml (2 hunks)
  • examples/movies/demo-movies.ipynb (5 hunks)
  • examples/trip/demo_orchestrator_trip.ipynb (5 hunks)
  • graphrag_sdk/attribute.py (2 hunks)
  • graphrag_sdk/document_loaders/pdf.py (3 hunks)
  • graphrag_sdk/helpers.py (5 hunks)
  • graphrag_sdk/kg.py (9 hunks)
  • graphrag_sdk/models/gemini.py (3 hunks)
  • graphrag_sdk/models/model.py (2 hunks)
  • graphrag_sdk/models/openai.py (2 hunks)
  • graphrag_sdk/ontology.py (1 hunks)
  • graphrag_sdk/orchestrator/execution_plan.py (1 hunks)
  • graphrag_sdk/orchestrator/orchestrator.py (1 hunks)
  • graphrag_sdk/orchestrator/orchestrator_runner.py (0 hunks)
  • graphrag_sdk/orchestrator/step.py (2 hunks)
  • graphrag_sdk/orchestrator/step_result.py (1 hunks)
  • graphrag_sdk/orchestrator/steps/agent.py (1 hunks)
  • graphrag_sdk/orchestrator/steps/parallel.py (3 hunks)
  • graphrag_sdk/orchestrator/steps/summary.py (0 hunks)
  • graphrag_sdk/orchestrator/steps/user_input.py (1 hunks)
  • graphrag_sdk/steps/Step.py (0 hunks)
  • graphrag_sdk/steps/create_ontology_step.py (9 hunks)
  • graphrag_sdk/steps/extract_data_step.py (8 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_auto_create_ontology.py (1 hunks)
  • tests/test_helper_validate_cypher.py (1 hunks)
  • tests/test_kg.py (0 hunks)
  • tests/test_kg_gemini.py (1 hunks)
  • tests/test_kg_litellm_openai.py (2 hunks)
  • tests/test_kg_ollama.py (3 hunks)
  • tests/test_kg_openai.py (2 hunks)
  • tests/test_multi_agent.py (0 hunks)

Walkthrough

The pull request introduces two new steps in the GitHub Actions workflow for running tests. The first addition is a step that utilizes the datadog/[email protected] action for code formatting checks. The second addition employs the astral-sh/ruff-action@v3 action to analyze the source directory ./graphrag_sdk. Additionally, several import statements have been removed across various files, indicating a reduction in dependencies, while some logging statements have been simplified for clarity. The changes aim to streamline the codebase without altering existing functionalities.

Changes

File Change Summary
.github/workflows/test.yml Added Black formatter step and Ruff static code analysis step for the ./graphrag_sdk directory.
graphrag_sdk/agents/agent.py, graphrag_sdk/agents/kg_agent.py Removed import of GenerativeModelChatSession.
graphrag_sdk/attribute.py Removed import of regex fixtures.
graphrag_sdk/document_loaders/pdf.py Changed import handling for pypdf to use __import__ and updated exception type to ModuleNotFoundError.
graphrag_sdk/helpers.py Updated variable names in stringify_falkordb_response function for clarity.
graphrag_sdk/models/litellm.py Added comment to exception handling in check_valid_key method.
graphrag_sdk/orchestrator/__init__.py Removed StepResult, PlanStep, and StepBlockType from __all__.
graphrag_sdk/orchestrator/execution_plan.py Removed import of StepBlockType.
graphrag_sdk/orchestrator/orchestrator.py Removed import of OutputMethod.
graphrag_sdk/orchestrator/steps/parallel.py Updated type hints and constructor for ParallelProperties and changed inheritance for ParallelStep.
graphrag_sdk/orchestrator/steps/summary.py Removed import of ThreadPoolExecutor and wait.
graphrag_sdk/source.py Updated constructor signatures for CSV and JSONL classes to include rows_per_document.
graphrag_sdk/steps/create_ontology_step.py Simplified logging statements and adjusted error handling for JSON extraction.
graphrag_sdk/steps/extract_data_step.py Modified logging and error handling in _process_source method.
graphrag_sdk/model_config.py Added from_json and to_json methods to KnowledgeGraphModelConfig class.
tests/test_kg_litellm_openai.py Introduced new test class TestKGLiteLLM.
tests/test_multi_agent.py Removed assertion checking for "itinerary" or "day" in test_multi_agent method.

Possibly related PRs

  • Progress bar to the KG creation step #54: The changes in this PR enhance the CreateOntologyStep class with a progress bar and concurrency improvements, which may relate to the overall workflow and testing enhancements introduced in the main PR's GitHub Actions workflow.

Suggested reviewers

  • swilly22

Poem

🐰 A Ruff Rabbit's Code Check Delight
In workflows where quality takes flight,
Ruff hops in with a linting might,
Scanning code with rabbit-keen sight,
Catching bugs before they can bite!
🔍✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/workflows/test.yml (3)

36-37: Consider additional Ruff configuration options

While the basic setup works, consider enhancing the linting configuration:

  • Pin to a specific version (e.g., v3.0.0) for reproducibility
  • Add rules parameter to specify which checks to enable/disable
  • Configure format parameter for better CI output

Example configuration:

       - uses: astral-sh/ruff-action@v3
         with:
+          rules: "E,F,B,I"  # Example: Enable error, flake8, bugbear, and isort rules
+          format: "github"  # Format output for better GitHub integration
           src: >-

38-41: Simplify path specifications and consider adding pyproject.toml

The source paths can be simplified, and Ruff configuration could be moved to a configuration file:

  1. Simplify paths:
         with:
           src: >-
-            "./graphrag_sdk"
-            "./examples"
-            "./tests"
+            graphrag_sdk
+            examples
+            tests
  1. Consider adding a pyproject.toml file with Ruff configuration:
[tool.ruff]
target-version = "py3"
select = ["E", "F", "B", "I"]  # Example rules
ignore = []  # Add rules to ignore

42-42: Remove trailing spaces

There are trailing spaces on this line that should be removed.

-            
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 42-42: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58e02db and 45f7933.

📒 Files selected for processing (1)
  • .github/workflows/test.yml (1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/test.yml

[error] 42-42: trailing spaces

(trailing-spaces)

🔇 Additional comments (1)
.github/workflows/test.yml (1)

36-36: Well-placed linting step!

Good placement of the Ruff linting step early in the workflow, right after Python setup. This ensures code quality issues are caught before proceeding with dependency installation and testing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
graphrag_sdk/orchestrator/steps/parallel.py (2)

5-5: Optimize imports to avoid redundancy

Consider importing StepBlockType directly and removing the module-level import since PlanStep is now directly imported. This would make the dependencies clearer and reduce namespace pollution.

-import graphrag_sdk.orchestrator.step
+from graphrag_sdk.orchestrator.step import PlanStep, StepBlockType
-from graphrag_sdk.orchestrator.step import PlanStep

62-62: Simplify StepBlockType usage for consistency

The class inheritance has been simplified to use the direct PlanStep reference, but StepBlockType is still being accessed through the full module path. For consistency, consider using a direct import for StepBlockType as suggested in the import optimization above.

class ParallelStep(PlanStep):
    # ...
    @property
    def block(self) -> graphrag_sdk.orchestrator.step.StepBlockType:
-        return graphrag_sdk.orchestrator.step.StepBlockType.PARALLEL
+        return StepBlockType.PARALLEL

Also applies to: 74-75

graphrag_sdk/source.py (1)

2-2: Consider enforcing abstract methods

The AbstractSource class doesn't enforce any abstract methods despite being an abstract base class. Consider marking load() as abstract to ensure all subclasses implement it.

 from abc import ABC
+from abc import abstractmethod

 class AbstractSource(ABC):
     def __init__(self, path: str):
         self.path = path
         self.loader = None
         self.instruction = ""

+    @abstractmethod
     def load(self) -> Iterator[Document]:
-        return self.loader.load()
+        pass
graphrag_sdk/document_loaders/pdf.py (1)

18-22: Consider using a more readable import pattern

While using ModuleNotFoundError is more specific and appropriate, the __import__ approach is less readable than the standard import pattern.

 try:
-    __import__('pypdf')
+    import pypdf  # noqa: F401
 except ModuleNotFoundError:
     raise ModuleNotFoundError(
         "pypdf package not found, please install it with " "`pip install pypdf`"
     )
🧰 Tools
🪛 Ruff (0.8.2)

20-22: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

.github/workflows/test.yml (1)

36-40: Remove trailing spaces and consider adding ruff configuration

  1. Remove trailing spaces on line 40.
  2. Consider adding a pyproject.toml or ruff.toml to configure ruff rules for the project.
       - uses: astral-sh/ruff-action@v3
         with:
           src: >-
             "./graphrag_sdk"
-            
+

Would you like me to help you create a basic ruff configuration file with recommended settings for your project?

🧰 Tools
🪛 yamllint (1.35.1)

[error] 40-40: trailing spaces

(trailing-spaces)

graphrag_sdk/steps/extract_data_step.py (1)

Line range hint 1-20: Consider additional code quality improvements.

While the current changes improve code quality, here are some additional suggestions that align with the PR's objective:

  1. Consider organizing imports into standard library, third-party, and local imports
  2. Replace print statements in _create_entity and _create_relation with proper logging
  3. Consider adding more context to error messages, such as including the actual values that caused the error

Example of organized imports:

# Standard library
import os
import time
import json
import logging
from threading import Lock
from uuid import uuid4
from concurrent.futures import Future, ThreadPoolExecutor

# Third-party
from tqdm import tqdm
from falkordb import Graph
from ratelimit import limits, sleep_and_retry

# Local
from graphrag_sdk.steps.Step import Step
from graphrag_sdk.source import AbstractSource
from graphrag_sdk.models.model import OutputMethod
from graphrag_sdk.helpers import extract_json, map_dict_to_cypher_properties
from graphrag_sdk.ontology import Ontology
from graphrag_sdk.models import (
    GenerativeModel,
    GenerativeModelChatSession,
    GenerationResponse,
    FinishReason,
)
from graphrag_sdk.fixtures.prompts import (
    EXTRACT_DATA_SYSTEM,
    EXTRACT_DATA_PROMPT,
    FIX_JSON_PROMPT,
    COMPLETE_DATA_EXTRACTION,
)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45f7933 and b510f38.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .github/workflows/test.yml (1 hunks)
  • graphrag_sdk/agents/agent.py (0 hunks)
  • graphrag_sdk/agents/kg_agent.py (0 hunks)
  • graphrag_sdk/attribute.py (0 hunks)
  • graphrag_sdk/document_loaders/pdf.py (1 hunks)
  • graphrag_sdk/helpers.py (1 hunks)
  • graphrag_sdk/models/litellm.py (1 hunks)
  • graphrag_sdk/orchestrator/__init__.py (0 hunks)
  • graphrag_sdk/orchestrator/execution_plan.py (1 hunks)
  • graphrag_sdk/orchestrator/orchestrator.py (0 hunks)
  • graphrag_sdk/orchestrator/steps/parallel.py (3 hunks)
  • graphrag_sdk/orchestrator/steps/summary.py (0 hunks)
  • graphrag_sdk/source.py (1 hunks)
  • graphrag_sdk/steps/create_ontology_step.py (3 hunks)
  • graphrag_sdk/steps/extract_data_step.py (2 hunks)
💤 Files with no reviewable changes (6)
  • graphrag_sdk/agents/kg_agent.py
  • graphrag_sdk/orchestrator/orchestrator.py
  • graphrag_sdk/orchestrator/steps/summary.py
  • graphrag_sdk/agents/agent.py
  • graphrag_sdk/attribute.py
  • graphrag_sdk/orchestrator/init.py
✅ Files skipped from review due to trivial changes (4)
  • graphrag_sdk/models/litellm.py
  • graphrag_sdk/helpers.py
  • graphrag_sdk/steps/create_ontology_step.py
  • graphrag_sdk/orchestrator/execution_plan.py
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/test.yml

[error] 40-40: trailing spaces

(trailing-spaces)

🔇 Additional comments (5)
graphrag_sdk/orchestrator/steps/parallel.py (2)

38-38: LGTM! Clean type annotation update

Good improvement replacing the string literal type with a direct reference, which is cleaner and more maintainable.


40-40: LGTM! Consistent type annotations

Good consistency maintaining the same type annotation style between the property and constructor parameter.

graphrag_sdk/source.py (1)

Line range hint 137-156: LGTM! Type hints and parameter additions look good.

The addition of type hints for path parameters and the new rows_per_document parameter in CSV and JSONL classes improves code quality and consistency.

graphrag_sdk/steps/extract_data_step.py (2)

176-176: LGTM! Simplified logging statement.

The logging statement has been simplified while maintaining its functionality, which aligns with good logging practices.


190-190: LGTM! Improved error message clarity.

The error message has been refined to be more concise while still providing the essential information about missing required fields.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
graphrag_sdk/models/azure_openai.py (1)

Client initialization check required in ask() method

The review comment is correct. The self.client is only initialized when with_system_instruction() is called, but ask() can be invoked without prior initialization, leading to a NoneType error. Consider either:

  • Initializing the client in __init__
  • Adding a client initialization check in ask()
🔗 Analysis chain

Line range hint 95-105: Ensure client is connected before making requests

In the ask method, self.client may not be initialized if _connect_to_model hasn't been called. This can lead to runtime errors when attempting to create chat completions. Verify that the client is connected before making API calls.

Run the following script to check where _connect_to_model is called:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to '_connect_to_model' in the codebase.

rg '_connect_to_model\(' --type py

Length of output: 182


Script:

#!/bin/bash
# Get the class definition and context around _connect_to_model usage
ast-grep --pattern 'class AzureOpenAI {
  $$$
}'  graphrag_sdk/models/azure_openai.py

echo "=== Context around _connect_to_model call ==="
rg '_connect_to_model\(' -A 5 -B 5 graphrag_sdk/models/azure_openai.py

Length of output: 1043


Script:

#!/bin/bash
# Find the ask method implementation and its context
ast-grep --pattern 'def ask($$$) {
  $$$
}' graphrag_sdk/models/azure_openai.py

echo "=== Class structure ==="
ast-grep --pattern 'class AzureOpenAI {
  def __init__($$$) {
    $$$
  }
  $$$
}' graphrag_sdk/models/azure_openai.py

Length of output: 259


Script:

#!/bin/bash
# Get the complete file content
echo "=== Complete file content ==="
cat graphrag_sdk/models/azure_openai.py

Length of output: 9160

🧹 Nitpick comments (12)
graphrag_sdk/orchestrator/step_result.py (1)

Line range hint 3-22: Well-structured abstract base class with comprehensive interface.

The StepResult ABC provides a robust contract for step results with proper type hints and essential methods. The design follows good practices by:

  • Using property decorator for output
  • Including serialization methods (to_json/from_json)
  • Providing string representation methods

Consider adding docstrings to document the expected behavior of each abstract method, especially the format expected by to_json and from_json.

 class StepResult(ABC):
+    """Abstract base class defining the interface for step execution results."""
 
     @property
     @abstractmethod
     def output(self) -> str:
+        """Returns the output string representation of the step result."""
         pass
 
     @abstractmethod
     def to_json(self) -> dict:
+        """Serializes the step result to a JSON-compatible dictionary."""
         pass
graphrag_sdk/orchestrator/step.py (2)

Line range hint 5-23: Consider simplifying the StepBlockType.from_str method.

The current implementation using multiple if-elif statements could be simplified using a mapping or match statement.

     @staticmethod
     def from_str(text: str) -> "StepBlockType":
-        if text == StepBlockType.PARALLEL:
-            return StepBlockType.PARALLEL
-        elif text == StepBlockType.AGENT:
-            return StepBlockType.AGENT
-        elif text == StepBlockType.SUMMARY:
-            return StepBlockType.SUMMARY
-        elif text == StepBlockType.USER_INPUT:
-            return StepBlockType.USER_INPUT
-        else:
+        valid_types = {
+            StepBlockType.PARALLEL,
+            StepBlockType.AGENT,
+            StepBlockType.SUMMARY,
+            StepBlockType.USER_INPUT
+        }
+        if text not in valid_types:
             raise ValueError(f"Unknown step block type: {text}")
+        return text

44-53: Enhance error handling in from_json method.

The method could benefit from more robust error handling:

  1. Validate required fields in JSON
  2. Provide more descriptive error messages
     def from_json(json: dict | str) -> "PlanStep":
         json = json if isinstance(json, dict) else loads(json)
         from graphrag_sdk.orchestrator.steps import PLAN_STEP_TYPE_MAP
 
+        if "block" not in json:
+            raise ValueError("Missing required field 'block' in JSON")
+
         block = StepBlockType.from_str(json["block"])
         step_type = PLAN_STEP_TYPE_MAP[block]
 
         if step_type is None:
-            raise ValueError(f"Unknown step block type: {block}")
+            raise ValueError(f"No implementation found for step block type: {block}")
graphrag_sdk/models/model.py (1)

10-14: LGTM! Consider adding docstring documentation.

The OutputMethod enum is well-structured and follows Python best practices. Consider adding a docstring to document the purpose and usage of each enum value.

 class OutputMethod(Enum):
+    """Defines the available output methods for model responses.
+
+    Attributes:
+        JSON: Format the response as JSON
+        DEFAULT: Use the default response format
+    """
     JSON = "json"
     DEFAULT = "default"
graphrag_sdk/models/azure_openai.py (1)

Line range hint 190-197: Use a constant for maximum message length

In the send_message method, the message is sliced to message[:14385]. Hardcoding this value can lead to maintenance challenges. Consider defining a constant or using the model's maximum token limit to make the code more flexible and maintainable.

Apply this diff to introduce a constant:

+MAX_MESSAGE_LENGTH = 14385  # Adjust based on model's token limit

     def send_message(
         self, message: str, output_method: OutputMethod = OutputMethod.DEFAULT
     ) -> GenerationResponse:
         """
         Send a message in the chat session and receive the model's response.
         """
-        prompt.append({"role": "user", "content": message[:14385]})
+        prompt.append({"role": "user", "content": message[:MAX_MESSAGE_LENGTH]})
tests/test_kg_litellm_openai.py (2)

89-89: Consider enhancing test assertions

The test only checks the number of actors but could be more comprehensive. Consider adding assertions for:

  • The correctness of the answer format
  • The presence of specific expected information
  • Edge cases with different input data

Also applies to: 93-96


89-89: Use raw string for regex pattern

When using regex patterns, it's recommended to use raw strings (r"pattern") to avoid any potential issues with escape sequences.

-        actors_count = re.findall(r"\d+", answer)
+        actors_count = re.findall(r"\d+", answer)
.github/workflows/test.yml (1)

36-44: Remove trailing whitespace in YAML

There's a trailing whitespace on line 44 that should be removed for consistency.

          src: >-
            "./graphrag_sdk"
-            
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 44-44: trailing spaces

(trailing-spaces)

graphrag_sdk/steps/create_ontology_step.py (2)

65-69: Simplify nested context managers

Use a single with statement with multiple contexts for better readability.

-        with tqdm(
-            total=len(self.sources) + 1,
-            desc="Process Documents",
-            disable=self.hide_progress,
-        ) as pbar:
-            with ThreadPoolExecutor(max_workers=self.config["max_workers"]) as executor:
+        with tqdm(
+            total=len(self.sources) + 1,
+            desc="Process Documents",
+            disable=self.hide_progress,
+        ) as pbar, ThreadPoolExecutor(
+            max_workers=self.config["max_workers"]
+        ) as executor:
🧰 Tools
🪛 Ruff (0.8.2)

65-70: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


136-139: Consider adding a maximum retry limit constant

The retry logic uses a hardcoded value. Consider extracting this to a named constant for better maintainability.

+MAX_RESPONSE_RETRIES = 3  # Add at the top of the file with other constants
+
 while (
     responses[response_idx].finish_reason == FinishReason.MAX_TOKENS
-    and response_idx < retries
+    and response_idx < MAX_RESPONSE_RETRIES
 ):
graphrag_sdk/kg.py (2)

92-95: Add stacklevel to warnings for better traceability.

When using warnings, specify the stacklevel to ensure warnings point to the correct source location.

 warnings.warn(
     "Cypher system instruction should contain {ontology}",
     category=UserWarning,
+    stacklevel=2
 )
🧰 Tools
🪛 Ruff (0.8.2)

92-92: No explicit stacklevel keyword argument found

(B028)


122-125: Add stacklevel to warnings for better traceability.

When using warnings, specify the stacklevel to ensure warnings point to the correct source location.

 warnings.warn(
     "Cypher generation prompt with history should contain {last_answer}",
     category=UserWarning,
+    stacklevel=2
 )
🧰 Tools
🪛 Ruff (0.8.2)

122-122: No explicit stacklevel keyword argument found

(B028)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b510f38 and 06d27bf.

📒 Files selected for processing (42)
  • .github/workflows/test.yml (1 hunks)
  • graphrag_sdk/__init__.py (1 hunks)
  • graphrag_sdk/agents/__init__.py (1 hunks)
  • graphrag_sdk/agents/agent.py (0 hunks)
  • graphrag_sdk/agents/kg_agent.py (1 hunks)
  • graphrag_sdk/attribute.py (1 hunks)
  • graphrag_sdk/chat_session.py (6 hunks)
  • graphrag_sdk/document.py (1 hunks)
  • graphrag_sdk/document_loaders/jsonl.py (1 hunks)
  • graphrag_sdk/document_loaders/pdf.py (3 hunks)
  • graphrag_sdk/document_loaders/text.py (2 hunks)
  • graphrag_sdk/document_loaders/url.py (3 hunks)
  • graphrag_sdk/fixtures/regex.py (1 hunks)
  • graphrag_sdk/helpers.py (5 hunks)
  • graphrag_sdk/kg.py (9 hunks)
  • graphrag_sdk/model_config.py (2 hunks)
  • graphrag_sdk/models/__init__.py (1 hunks)
  • graphrag_sdk/models/azure_openai.py (7 hunks)
  • graphrag_sdk/models/gemini.py (3 hunks)
  • graphrag_sdk/models/litellm.py (11 hunks)
  • graphrag_sdk/models/model.py (2 hunks)
  • graphrag_sdk/models/ollama.py (11 hunks)
  • graphrag_sdk/models/openai.py (2 hunks)
  • graphrag_sdk/orchestrator/__init__.py (1 hunks)
  • graphrag_sdk/orchestrator/orchestrator.py (1 hunks)
  • graphrag_sdk/orchestrator/step.py (2 hunks)
  • graphrag_sdk/orchestrator/step_result.py (1 hunks)
  • graphrag_sdk/orchestrator/steps/__init__.py (1 hunks)
  • graphrag_sdk/orchestrator/steps/agent.py (1 hunks)
  • graphrag_sdk/orchestrator/steps/parallel.py (3 hunks)
  • graphrag_sdk/orchestrator/steps/user_input.py (1 hunks)
  • graphrag_sdk/relation.py (1 hunks)
  • graphrag_sdk/steps/create_ontology_step.py (9 hunks)
  • graphrag_sdk/steps/extract_data_step.py (8 hunks)
  • graphrag_sdk/steps/graph_query_step.py (1 hunks)
  • tests/test_auto_create_ontology.py (1 hunks)
  • tests/test_helper_validate_cypher.py (1 hunks)
  • tests/test_kg_gemini.py (1 hunks)
  • tests/test_kg_litellm_openai.py (2 hunks)
  • tests/test_kg_ollama.py (3 hunks)
  • tests/test_kg_openai.py (2 hunks)
  • tests/test_multi_agent.py (0 hunks)
💤 Files with no reviewable changes (2)
  • graphrag_sdk/agents/agent.py
  • tests/test_multi_agent.py
✅ Files skipped from review due to trivial changes (19)
  • graphrag_sdk/models/init.py
  • graphrag_sdk/init.py
  • graphrag_sdk/document.py
  • tests/test_helper_validate_cypher.py
  • graphrag_sdk/orchestrator/steps/agent.py
  • graphrag_sdk/document_loaders/jsonl.py
  • tests/test_kg_openai.py
  • tests/test_kg_gemini.py
  • graphrag_sdk/models/ollama.py
  • graphrag_sdk/fixtures/regex.py
  • graphrag_sdk/document_loaders/text.py
  • graphrag_sdk/steps/graph_query_step.py
  • graphrag_sdk/agents/init.py
  • graphrag_sdk/chat_session.py
  • graphrag_sdk/models/gemini.py
  • graphrag_sdk/models/openai.py
  • graphrag_sdk/relation.py
  • tests/test_kg_ollama.py
  • graphrag_sdk/document_loaders/url.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • graphrag_sdk/orchestrator/init.py
  • graphrag_sdk/attribute.py
  • graphrag_sdk/orchestrator/orchestrator.py
  • graphrag_sdk/orchestrator/steps/parallel.py
  • graphrag_sdk/steps/extract_data_step.py
  • graphrag_sdk/helpers.py
  • graphrag_sdk/agents/kg_agent.py
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/test.yml

[error] 44-44: trailing spaces

(trailing-spaces)

🪛 Ruff (0.8.2)
graphrag_sdk/document_loaders/pdf.py

21-23: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

graphrag_sdk/kg.py

92-92: No explicit stacklevel keyword argument found

(B028)


122-122: No explicit stacklevel keyword argument found

(B028)

graphrag_sdk/models/litellm.py

239-241: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

graphrag_sdk/steps/create_ontology_step.py

65-70: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: test
🔇 Additional comments (11)
graphrag_sdk/orchestrator/steps/__init__.py (1)

20-20: LGTM! Appropriate addition to public API.

Making PLAN_STEP_TYPE_MAP public is correct as it's a key part of the step type system.

graphrag_sdk/orchestrator/steps/user_input.py (1)

51-56: LGTM! Well-implemented string representations.

The implementation follows Python's best practices for __str__ and __repr__ methods.

graphrag_sdk/models/model.py (2)

94-96: LGTM! Clean signature formatting with proper type annotations.

The method signature is well-formatted with clear type annotations and appropriate default value using the new OutputMethod enum.


10-14: Verify implementations and usage of OutputMethod enum.

Since we've added a new parameter to the abstract send_message method, we should ensure all implementations and callers are updated accordingly.

Let's verify the implementations and usage:

Also applies to: 94-96

✅ Verification successful

OutputMethod enum implementation verified successfully

All implementations of GenerativeModelChatSession correctly handle the output_method parameter with proper default values, ensuring backward compatibility. The parameter is properly used to adjust generation configuration in each implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all implementations of GenerativeModelChatSession
echo "=== Checking implementations of GenerativeModelChatSession ==="
ast-grep --pattern 'class $NAME($$$GenerativeModelChatSession$$$):
  $$$
  def send_message($$$):
    $$$'

# Find all calls to send_message to ensure they use the enum correctly
echo "=== Checking calls to send_message ==="
rg "\.send_message\(" -A 2

Length of output: 5333


Script:

#!/bin/bash
# Find the base class definition
echo "=== Checking GenerativeModelChatSession base class ==="
ast-grep --pattern 'class GenerativeModelChatSession:
  $$$'

# Find implementations with a more flexible pattern
echo "=== Checking implementations with adjusted pattern ==="
ast-grep --pattern 'class $NAME($$$):
  $$$
  async def send_message($$$)'

# Find the specific file content
echo "=== Checking specific file content ==="
cat graphrag_sdk/models/model.py

Length of output: 3915


Script:

#!/bin/bash
# Find implementations with corrected pattern
echo "=== Checking implementations with corrected pattern ==="
ast-grep --pattern 'class $NAME($$$):
  $$$
  def send_message($$$):
    $$$'

# Find all files that might contain implementations
echo "=== Checking files that might contain implementations ==="
fd "model.*\.py" --type f --exec cat {}

Length of output: 6348


Script:

#!/bin/bash
# Find files containing GenerativeModelChatSession
echo "=== Finding files with GenerativeModelChatSession ==="
rg -l "GenerativeModelChatSession"

# Check content of potential implementation files
echo "=== Checking implementation files ==="
fd -e py -p "gemini|openai|ollama" --type f --exec cat {}

Length of output: 40363

graphrag_sdk/models/azure_openai.py (1)

Line range hint 3-3: Verify the existence of AzureOpenAI in the OpenAI library

The import statement from openai import AzureOpenAI assumes that the AzureOpenAI class is available in the openai package. Please verify that AzureOpenAI is a valid class in the OpenAI library or ensure that it is defined elsewhere in the codebase to prevent import errors.

Run the following script to check if AzureOpenAI is defined in the codebase:

tests/test_auto_create_ontology.py (1)

2-2: Good addition of load_dotenv() to load environment variables

Calling load_dotenv() ensures that environment variables from the .env file are loaded, providing necessary configurations for the test. This improves the reliability of the test setup.

graphrag_sdk/model_config.py (1)

Line range hint 50-68: Addition of JSON serialization methods enhances configurability

The from_json and to_json methods enable serialization and deserialization of KnowledgeGraphModelConfig instances. This enhances the flexibility of the configuration management and is implemented correctly.

graphrag_sdk/steps/create_ontology_step.py (1)

95-98: Consider adding error handling for the final task

The final task submission should include error handling to ensure the ontology is properly fixed.

graphrag_sdk/models/litellm.py (1)

258-260: LGTM! Good practice for JSON output.

Setting temperature to 0 for JSON output ensures deterministic and well-structured responses.

graphrag_sdk/kg.py (2)

118-120: LGTM! Clear error message formatting.

The multi-line formatting improves readability while maintaining a clear and descriptive error message.


229-238: LGTM! Well-formatted constructor call.

The multi-line formatting improves readability and makes the parameter passing clear.

Comment on lines +226 to +228
config["temperature"] = 0
config["response_format"] = {"type": "json_object"}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Invalid parameter response_format in generation config

In the _get_generation_config method, setting config["response_format"] = {"type": "json_object"} may not be supported by the Azure OpenAI API. The chat.completions.create method does not recognize a response_format parameter, which could result in errors. Please verify the API documentation and remove or replace the invalid parameter.

Apply this diff to remove the unsupported parameter:

             config["temperature"] = 0
-            config["response_format"] = {"type": "json_object"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config["temperature"] = 0
config["response_format"] = {"type": "json_object"}
config["temperature"] = 0

Comment on lines +38 to +43

# Credentials
self.api_key = os.getenv("AZURE_OPENAI_API_KEY")
self.azure_endpoint = os.getenv("AZURE_ENDPOINT")
self.api_version = os.getenv("AZURE_API_VERSION")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Initialize the client in the constructor to prevent NoneType errors

The self.client attribute is not initialized in the __init__ method, but methods like ask rely on it being initialized. This could lead to a NoneType error if ask is called before with_system_instruction. To prevent runtime errors, initialize self.client in the constructor.

Apply this diff to initialize self.client in the __init__ method:

     if not self.api_key or not self.azure_endpoint or not self.api_version:
         raise ValueError(
             "Missing credentials in the environment: AZURE_OPENAI_API_KEY, AZURE_ENDPOINT, or AZURE_API_VERSION."
         )

+    self._connect_to_model()

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 19 to 23
__import__("pypdf")
except ModuleNotFoundError:
raise ModuleNotFoundError(
"pypdf package not found, please install it with " "`pip install pypdf`"
)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception chaining in ModuleNotFoundError handling

Use exception chaining to preserve the original error context. This helps with debugging by maintaining the full error traceback.

         try:
             __import__("pypdf")
-        except ModuleNotFoundError:
+        except ModuleNotFoundError as err:
             raise ModuleNotFoundError(
                 "pypdf package not found, please install it with " "`pip install pypdf`"
-            )
+            ) from err
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__import__("pypdf")
except ModuleNotFoundError:
raise ModuleNotFoundError(
"pypdf package not found, please install it with " "`pip install pypdf`"
)
)
__import__("pypdf")
except ModuleNotFoundError as err:
raise ModuleNotFoundError(
"pypdf package not found, please install it with " "`pip install pypdf`"
) from err
🧰 Tools
🪛 Ruff (0.8.2)

21-23: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +239 to +241
raise ValueError(
f"Error during completion request, please check the credentials - {e}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Preserve exception context when re-raising.

When re-raising exceptions, it's important to maintain the original error context using exception chaining.

 except Exception as e:
-    raise ValueError(
-        f"Error during completion request, please check the credentials - {e}"
-    )
+    raise ValueError(
+        f"Error during completion request, please check the credentials - {e}"
+    ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError(
f"Error during completion request, please check the credentials - {e}"
)
raise ValueError(
f"Error during completion request, please check the credentials - {e}"
) from e
🧰 Tools
🪛 Ruff (0.8.2)

239-241: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +72 to +76
completion(model=model, messages=messages, max_tokens=10)
return True
except:
except: # noqa: E722
return False

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling by preserving the error context.

The bare except clause silently swallows all exceptions. While the intention might be to catch all errors, it's better to preserve the original error context for debugging purposes.

 try:
     completion(model=model, messages=messages, max_tokens=10)
     return True
-except:  # noqa: E722
+except Exception as e:
+    logger.debug(f"Key validation failed: {e}")
     return False

Committable suggestion skipped: line range outside the PR's diff.

@gkorland gkorland changed the title Add ruff lint Add Lint to CI (Ruff & Black) Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant