forked from cline/cline
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic Search and Codebase Indexing Service Implementation #609
Draft
daniel-lxs
wants to merge
233
commits into
RooVetGit:main
Choose a base branch
from
daniel-lxs:semantic_search
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+14,653
−2,620
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…onents, and modify extension configuration
…, add advanced result deduplication, improve .gitignore file indexing, update semantic search plan, refactor code parsing and indexing
…mproved result formatting, deduplication, and error handling. Update types for search results to include vector data and refine logging for better debugging.
…e-specific caching. Update cache key generation to include workspace ID, enhancing data isolation for semantic search operations.
…o SemanticSearchService - Introduce relationships in CodeDefinition to enhance context for embedding generation. - Refactor embedding methods to leverage contextual information. - Update TreeSitterParser to extract relationship data during code parsing.
- Introduce new commands for managing the semantic search index: reindexing and deletion. - Implement progress tracking for indexing operations, providing real-time feedback in the UI. - Add support for filtering indexed files based on supported extensions. - Update the SettingsView component to include controls for semantic search settings and display indexing progress. - Refactor SemanticSearchService to handle indexing and clearing of the semantic search index more effectively.
…ne class - Remove semantic search initialization from extension.ts - Integrate semantic search initialization directly into Cline class - Simplify initialization process and error handling - Improve background initialization and error logging - Maintain existing semantic search functionality with more cohesive implementation
- Add @lancedb/lancedb package to project dependencies - Update esbuild configuration to include LanceDB Linux x64 GNU external dependency - Prepare for enhanced vector storage and retrieval capabilities
…ration - Modify semantic search indexing to index all files without extension filtering - Update storage directory configuration to use cache directory directly - Simplify logging for file indexing process - Remove unnecessary file extension filtering during indexing
…nd directory handling
… and text file support - Replace existing vector store with LanceDB implementation - Add robust text file detection and indexing capabilities - Implement content hash-based file change tracking - Enhance file indexing with improved memory management and file type handling - Add support for indexing non-code text files with semantic search - Improve search result ranking and filtering logic
- Update MiniLM model from L6 to L12 version for improved embedding quality
…t extraction - Refactor TreeSitterParser to simplify code segment extraction logic - Introduce a new TypeScript-specific query for more precise code parsing - Update CodeSegment type to support more flexible segment types - Add file hash verification to prevent unnecessary parsing - Enhance language parser to support custom queries for different languages
…and improve error handling - Move semantic search initialization logic to ClineProvider - Simplify semantic search service creation and workspace indexing - Add retry mechanism for semantic search initialization - Improve progress reporting and error handling during indexing - Pass semantic search service as a parameter to Cline constructor
…code parsing - Update TypeScript tree-sitter query to capture adjacent comments for code segments - Add support for stripping comment formatting and selecting adjacent documentation - Improve parsing of method, class, function, and variable declarations - Refine comment extraction and association with code elements
…nnecessary metadata - Remove detailed function and method metadata from CodeSegment type - Streamline TreeSitterParser to focus on core code segment extraction - Introduce CodeSegmentType enum for more type-safe segment classification - Improve context extraction with hierarchical parent tracking - Simplify import graph retrieval and parsing logic
…iable name extraction - Introduce JavaScript tree-sitter query for code segment parsing - Enhance variable name extraction with fallback to identifier nodes - Update WASM directory resolution to use current module directory - Implement import extraction for JavaScript files - Simplify import and export segment collection
…for code parsing - Create test cases for parsing TypeScript and JavaScript code segments - Cover parsing of classes, functions, imports, and variables - Implement dynamic test file generation and cleanup - Verify code segment extraction for different language constructs
- Create index file for semantic search language queries - Implement JavaScript tree-sitter query for parsing code segments - Support extraction of imports, classes, methods, functions, and variables - Align JavaScript parsing with existing TypeScript query structure
… directory support - Add optional `wasmDir` parameter to `loadRequiredLanguageParsers` function - Update language loading to use provided or default WASM directory - Modify `loadLanguage` function to accept custom WASM directory path - Improve parser initialization with dynamic file location configuration - Update return type to provide more detailed parser and query information
…rove initialization robustness - Introduce WorkspaceIndexStatus enum to track indexing progress - Add methods to update and retrieve workspace indexing status - Enhance initialization error handling and status management - Remove deprecated memory monitoring and initialization tracking code - Simplify initialization process with more focused error reporting
…test infrastructure - Delete memory monitoring classes and associated test files - Remove in-memory and persistent vector store implementations - Clean up deprecated memory tracking and vector storage code - Eliminate test infrastructure for memory and vector store components
- Delete global state keys for semantic search memory and score settings - Remove configuration handling for max memory and minimum score - Simplify semantic search initialization with default parameters - Add semantic search status tracking to global state
…ogic - Modify result processing to prioritize code results while maintaining original order - Simplify result formatting with more consistent type handling - Remove hardcoded score thresholds and filtering logic - Improve result deduplication and trimming to max results - Update result type conversion to use SearchResultType enum
…ypes - Remove detailed metadata fields from CodeDefinition - Refactor SearchResult types to use more concise structure - Update SearchResultType to use enum instead of string literals - Remove vector and score properties from search result interfaces - Streamline embedding generation by reducing metadata complexity
…pping - Update search method to return more concise VectorSearchResult type - Simplify result mapping by removing detailed CodeSearchResult structure - Reduce logging and console output in search method - Improve vector dimension calculation using vector length - Align vector search result with recent type refactoring
…for clarity - Update interface name from SearchResult to VectorSearchResult - Maintain existing type structure and method signatures - Improve type naming to better reflect vector search semantics
…n UI - Remove memory and score configuration sliders from settings view - Update ExtensionState and WebviewMessage to track semantic search status - Add workspace status display in settings with color-coded status indicator
…s tracking - Modify ClineProvider to accept semantic search service as a promise - Add dynamic status tracking for semantic search initialization - Implement WebView messaging for indexing progress and status updates - Update SettingsView to request and display semantic search status - Enhance ExtensionStateContext to manage semantic search status
…t normal(20-25%) window size
… in openai compatible section
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Semantic Search Service Implementation [⚠️ Work In Progress ⚠️ ]
Description
This PR introduces a comprehensive semantic search service for code and text files within a workspace. The service provides:
File Indexing:
Search Capabilities:
Infrastructure:
Configuration:
Type of change
How Has This Been Tested?
The service has been tested with:
Checklist:
Additional context
The service is designed to be extensible, with clear interfaces for: