Slack App: Fragments of a document #321
20001LastOrder
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description:
When we store documents into the vector database, we break the documents into fragments in order to fix the prompt size of the LLM. However, this may also remove the contextual information of the document.
Steps to Reproduce:
For example: @sherpa what did ehsan kamalinejad present?
Answer: Ehsan Kamalinejad's presentation focused on machine learning developments in natural language processing (NLP) at Amazon. He has also worked on computer vision projects, including Photos Memories, during his time at Apple. Ehsan is currently a lead scientist at Amazon and an associate professor at California State University. You can find more information about his work in the LLM Foundations section of the LLM Live Book Link.
Observation: it doesn't recognize that the page this info is found on is related to Ehsan's presentation
Actual Results:
see above
Expected Results:
Let's assume the question has the relevant context. Then the assistant should be able to use that context to find the right chunks to use. The problem is that the necessary contextual keywords might not be in the chunk where the answer to the question is.
I has to take into account the relationship between chunks and their higher level abstractios (pages, sections, book, ...) and then use that to narrow down to the right document and the right chunk.
Additional Information:
N/A
Reproducibility:
The actual answer might vary but the general pattern is reproducible.
Possible Solutions:
see expected results section
or alternatively: We need to add some contextual information to each fragment of the document, like the title or first several sentences.
Hirarechal index might be another approach
Related Issues:
N/A
Steps Taken So Far:
N/A
Environment:
N/A
Beta Was this translation helpful? Give feedback.
All reactions