`@ubiquity-os/issue-comment-embeddings`

This is a plugin for Ubiquibot. It listens for issue comments, and adds them to a vector store. It handles comment edits and deletions as well.

Configuration

Host the plugin on a server that Ubiquibot can access. To set up the .dev.vars file, you will need to provide the following variables:
SUPABASE_URL: The URL for your Supabase instance.
SUPABASE_KEY: The key for your Supabase instance.
VOYAGEAI_API_KEY: The API key for Voyage.

Usage

Add the following to your .ubiquity-os.config.yml file with the appropriate URL:

- plugin: https://ubiquity-os-comment-vector-embeddings-main.ubiquity.workers.dev
  with:
    matchThreshold: 0.95
    warningThreshold: 0.75
    jobMatchingThreshold: 0.75

Testing Locally

Run bun install to install the dependencies.
Run bun worker to start the server.
Make HTTP requests to the server to test the plugin with content type Application/JSON

{
    "stateId": "",
    "eventName": "issue_comment.created",
    "eventPayload": {
        "comment": {
            "user": {
                "login" : "COMMENTER"
            },
            "body": "<COMMENT_BODY>" ,
            "id": <UNIQUE_COMMENT_ID>
        },
        "repository" : {
            "name" : "REPONAME",
            "owner":{
                "login" : "USERNAME"
            }
        },
        "issue": {
            "number": <ISSUE_NUMBER>,
            "body": "<ISSUE_TEXT>"
        }
    },
    "env": {},
    "settings": {},
    "ref": "",
    "authToken": ""
}

Replace the placeholders with the appropriate values.

Testing

Run bun run test to run the tests.

Technical Implementation Details

This implementation leverages vector embeddings for intelligent issue management, combining modern NLP techniques with robust data storage to create a sophisticated issue tracking and deduplication system.

Architecture Overview

The system is built as a plugin that processes GitHub issues and comments through a series of specialized handlers. At its core, it uses two main services:

Voyage AI for generating text embeddings
Supabase for storing and querying vector embeddings

The plugin architecture is elegantly structured to handle various GitHub events:

if (isIssueCommentEvent(context)) {
  switch (eventName) {
    case "issue_comment.created":
      return await addComments(context);
    case "issue_comment.deleted":
      return await deleteComment(context);
    case "issue_comment.edited":
      return await updateComment(context);
  }
} else if (isIssueEvent(context)) {
  switch (eventName) {
    case "issues.opened":
      await addIssue(context);
      await issueMatching(context);
      return await issueChecker(context);
    // ... other issue events
  }
}

Vector Embeddings: The Core Technology

The most fascinating aspect of this system is its use of vector embeddings to understand and process text. The implementation uses Voyage AI's embedding service with their large instruction model:

async createEmbedding(text: string | null, inputType: EmbedRequestInputType = "document"): Promise<number[]> {
  if (text === null) {
    throw new Error("Text is null");
  } else {
    const response = await this.client.embed({
      input: text,
      model: "voyage-large-2-instruct",
      inputType,
    });
    return (response.data && response.data[0]?.embedding) || [];
  }
}

This converts text into high-dimensional vectors that capture semantic meaning, allowing for sophisticated similarity comparisons between issues.

Intelligent Issue Management

The system implements several advanced features for issue management:

1. Issue Deduplication

One of the most powerful features is the ability to find similar issues using vector similarity search. The implementation uses a custom-implemented vector similarity search:

async findSimilarIssues({ markdown, currentId, threshold }: FindSimilarIssuesParams): Promise<IssueSimilaritySearchResult[] | null> {
  const embedding = await this.context.adapters.voyage.embedding.createEmbedding(markdown);
  const { data, error } = await this.supabase.rpc("find_similar_issues", {
    query_embedding: embedding,
    current_id: currentId,
    threshold,
    top_k: 5,
  });
  // ... error handling
  return data;
}

This allows the system to:

Detect duplicate issues automatically
Find related issues based on content similarity
Maintain a clean issue tracker by preventing redundancy

2. Secure Storage for Private Issues

The system implements privacy-conscious storage of issue data:

if (isPrivate) {
  finalMarkdown = null;
  finalPayload = null;
  plaintext = null;
}

const { data, error } = await this.supabase
  .from("issues")
  .insert([{ 
    id: issueData.id, 
    plaintext, 
    embedding, 
    payload: finalPayload, 
    author_id: issueData.author_id, 
    markdown: finalMarkdown 
  }]);

This ensures that private issues are handled appropriately while still maintaining the vector embedding functionality.

3. Real-time Updates

The system maintains consistency by updating embeddings whenever issues are modified:

async updateIssue(issueData: IssueData) {
  const embedding = Array.from(await this.context.adapters.voyage.embedding.createEmbedding(issueData.markdown));
  // ... privacy handling
  const { error } = await this.supabase
    .from("issues")
    .update({
      markdown: finalMarkdown,
      plaintext,
      embedding,
      payload: finalPayload,
      modified_at: new Date(),
    })
    .eq("id", issueData.id);
}

This ensures that the semantic understanding of issues stays current even as their content evolves.

Technical Implementation Benefits

Scalability: The use of Supabase for vector storage and similarity search means the system can handle large numbers of issues efficiently.
Accuracy: By using Voyage AI's large instruction model for embeddings, the system achieves high-quality semantic understanding of issue content.
Maintainability: The modular architecture with separate handlers for different events makes the code easy to maintain and extend.
Real-time Processing: The system processes issues and comments in real-time, providing immediate feedback on duplicates and similar issues.

This implementation showcases how modern NLP techniques can be practically applied to improve developer workflows. By combining vector embeddings with efficient storage and similarity search, it creates a powerful system for managing and organizing issues intelligently.

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
.github		.github
.husky		.husky
dist		dist
src		src
supabase		supabase
tests		tests
.cspell.json		.cspell.json
.dev.vars.example		.dev.vars.example
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.yarnrc.yml		.yarnrc.yml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
bun.lock		bun.lock
eslint.config.mjs		eslint.config.mjs
graphql.config.yml		graphql.config.yml
jest.config.ts		jest.config.ts
manifest.json		manifest.json
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`@ubiquity-os/issue-comment-embeddings`

Configuration

Usage

Testing Locally

Testing

Technical Implementation Details

Architecture Overview

Vector Embeddings: The Core Technology

Intelligent Issue Management

1. Issue Deduplication

2. Secure Storage for Private Issues

3. Real-time Updates

Technical Implementation Benefits

About

Releases

Packages

Languages

minhyeong112/text-vector-embeddings

Folders and files

Latest commit

History

Repository files navigation

@ubiquity-os/issue-comment-embeddings

Configuration

Usage

Testing Locally

Testing

Technical Implementation Details

Architecture Overview

Vector Embeddings: The Core Technology

Intelligent Issue Management

1. Issue Deduplication

2. Secure Storage for Private Issues

3. Real-time Updates

Technical Implementation Benefits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`@ubiquity-os/issue-comment-embeddings`

Packages