-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of radix tree-based cache #678
Conversation
/cc |
/hold Let's merge this after v0.2.0 fully released |
@gangmuk Please rebase the main changes and we can review and get this one merged |
97847e3
to
cdf0888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please help on the review @varungup90 @DwyaneShi @kerthcet
id int | ||
children map[int]*TreeNode | ||
parent *TreeNode | ||
value []int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be token-based, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the tokenized input. For example, something like [1293, 273, 9684]. I used the same tokenizer Varun used. key in the tree node is tokenized input.
return n.contextLength | ||
} | ||
|
||
func (c *LPRadixCache) NewTreeNode(numPods int, parent *TreeNode, key []int, value []int) *TreeNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move struct LPRadixCache
earlier before the method definition
startTime time.Time | ||
} | ||
|
||
func NewLPRadixCache(numPods int) *LPRadixCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the part why numPods
is an argument for initialization? this would be changed dynamically in production cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We supposed to watch all the pods belonging to a model etc. a cache client could be embedded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you are right. I have updated that part in a new branch. Can we merge it and review this part in the next PR right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think by "a cache client could be embedded"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. we can have that part in following PRs.
a cache client can be used to list all pods or get pods etc.
{ObjectMeta: metav1.ObjectMeta{Name: "p2"}}, | ||
} | ||
|
||
inputText := "Hello World! What a Good Day! Good Morning! 你好世界!多么美好的一天啊!早上好!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not directly related to your PR. just want you know some feedback from users. Did you check issue here? #673 could we make this part pluggable to support string (or chracter base) as well? We do not necessarily need to support this case in this PR here, just for discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I saw the issue that the guy created. we can make it pluggable. It should be system-wide variable which shouldn't be changed during the runtime. Otherwise, it will mess up all the cache.
I think TokenizeInputText shouldn't be in each routing algorithm implementation. Currently, it is done in each Route function. It can be decoupled and done in common execution path (somewhere in gateway.go) before the Route.
Tokenization itself has two minor issues
- overhead (not sure before testing)
- debugging with the raw text is easier than looking at token ids. (so I used Detokenization on my side when I debugged my routing implementation.
I am not sure these are critical enough to support different ways of input embedding (raw string, tokenization method 1, tokenization method 2, etc)
/lgtm Let's merge this change now to unblock future ones |
value []int | ||
key []int | ||
refCounter []int | ||
load int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the load means here? I didn't see any other place use this field right now. Thanks.
Pull Request Description
This PR includes Radix tree based prefix cache implementation, complying with the current prefixindexer interface.
Note that it does not have routing policy. It only has data structure that can execute interface.
Related Issues
Resolves: #680
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.