Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of radix tree-based cache #678

Merged
merged 11 commits into from
Feb 20, 2025

Conversation

gangmuk
Copy link
Collaborator

@gangmuk gangmuk commented Feb 14, 2025

Pull Request Description

This PR includes Radix tree based prefix cache implementation, complying with the current prefixindexer interface.

Note that it does not have routing policy. It only has data structure that can execute interface.

Related Issues

Resolves: #680

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gangmuk gangmuk changed the title Initial implementation of radix tree-based cache [WIP] Initial implementation of radix tree-based cache Feb 14, 2025
@gangmuk gangmuk marked this pull request as draft February 14, 2025 21:58
@gangmuk gangmuk changed the title [WIP] Initial implementation of radix tree-based cache Initial implementation of radix tree-based cache Feb 15, 2025
@gangmuk gangmuk marked this pull request as ready for review February 15, 2025 00:35
@kerthcet
Copy link
Collaborator

/cc

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 17, 2025

/hold

Let's merge this after v0.2.0 fully released

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 20, 2025

@gangmuk Please rebase the main changes and we can review and get this one merged

@gangmuk gangmuk force-pushed the gangmuk/tree_based_prefix_routing branch from 97847e3 to cdf0888 Compare February 20, 2025 08:29
Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please help on the review @varungup90 @DwyaneShi @kerthcet

id int
children map[int]*TreeNode
parent *TreeNode
value []int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be token-based, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the tokenized input. For example, something like [1293, 273, 9684]. I used the same tokenizer Varun used. key in the tree node is tokenized input.

return n.contextLength
}

func (c *LPRadixCache) NewTreeNode(numPods int, parent *TreeNode, key []int, value []int) *TreeNode {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move struct LPRadixCache earlier before the method definition

startTime time.Time
}

func NewLPRadixCache(numPods int) *LPRadixCache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the part why numPods is an argument for initialization? this would be changed dynamically in production cases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We supposed to watch all the pods belonging to a model etc. a cache client could be embedded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you are right. I have updated that part in a new branch. Can we merge it and review this part in the next PR right away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think by "a cache client could be embedded"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. we can have that part in following PRs.

a cache client can be used to list all pods or get pods etc.

{ObjectMeta: metav1.ObjectMeta{Name: "p2"}},
}

inputText := "Hello World! What a Good Day! Good Morning! 你好世界!多么美好的一天啊!早上好!"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to your PR. just want you know some feedback from users. Did you check issue here? #673 could we make this part pluggable to support string (or chracter base) as well? We do not necessarily need to support this case in this PR here, just for discussion

Copy link
Collaborator Author

@gangmuk gangmuk Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I saw the issue that the guy created. we can make it pluggable. It should be system-wide variable which shouldn't be changed during the runtime. Otherwise, it will mess up all the cache.

I think TokenizeInputText shouldn't be in each routing algorithm implementation. Currently, it is done in each Route function. It can be decoupled and done in common execution path (somewhere in gateway.go) before the Route.

Tokenization itself has two minor issues

  1. overhead (not sure before testing)
  2. debugging with the raw text is easier than looking at token ids. (so I used Detokenization on my side when I debugged my routing implementation.

I am not sure these are critical enough to support different ways of input embedding (raw string, tokenization method 1, tokenization method 2, etc)

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 20, 2025

/lgtm

Let's merge this change now to unblock future ones

@Jeffwan Jeffwan merged commit 728c7c4 into main Feb 20, 2025
10 of 11 checks passed
@Jeffwan Jeffwan deleted the gangmuk/tree_based_prefix_routing branch February 20, 2025 21:45
value []int
key []int
refCounter []int
load int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the load means here? I didn't see any other place use this field right now. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Radix-tree based kv cache implementation
3 participants