diff --git a/MEMORY.md b/MEMORY.md deleted file mode 100644 index 01128dd9..00000000 --- a/MEMORY.md +++ /dev/null @@ -1,40 +0,0 @@ -# Long-Term memory -Memory is one part of a cognitive architecture. -Just as with cognitive architectures, we've found in practice that more application specific forms of memory can go a long way in increasing the reliability and performance of your application. - -When we think of long term memory, the most general abstraction is: -- There exists some state that is tracked over time -- This state is updated at some period -- This state is combined into the prompt in some way - - -So when you're building your application, we would highly recommend asking the above questions: -- What is the state that is tracked? -- How is the state updated? -- How is the state used? - -Of course, this is easier said than done. -And then even if you are able to answer those questions, how can you actually build it? -We've decided to give this a go within OpenGPTs and build a specific type of chatbot with a specific form of memory. - -We decided to build a chatbot that could reliably serve as a dungeon master for a game of dungeon and dragons. -What is the specific type of memory we wanted for this? - -**What is the state that is tracked?** - -We wanted to first make sure to track the characters that we're involved in the game. Who they were, their descriptions, etc. This seems like something that should be known. -We then also wanted to track the state of the game itself. What had happened up to that point, where they were, etc. -We decided to split this into two distinct things - so we were actually tracking an updating two different states. - -**How is the state updated?** - -For the character description, we just wanted to update that once at beginning. So we wanted our chatbot to gather all relevant information, update that state, and then never update it again. -Afterwards, we wanted our chatbot to attempt to update the state of the game every turn. If it decides that no update is necessary, then we won't update it. Otherwise, we will override the current state of the game with an LLM generated new state. - -**How is the state used?** - -We wanted both the character description and the state of the game to always be inserted into the prompt. This is pretty straightforward since they were both text, so it was just some prompt engineering with some placeholders for those variables. - -## Implementation -You can see the implementation for this in [this file](backend/packages/agent-executor/agent_executor/dnd.py). -This should be easily modifiable to track another state - to do so, you will want to update the prompts and maybe some of the channels that are written to. \ No newline at end of file diff --git a/README.md b/README.md index 957a0566..88db65c6 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,8 @@ # OpenGPTs This is an open source effort to create a similar experience to OpenAI's GPTs and Assistants API. -It builds upon [LangChain](https://github.com/langchain-ai/langchain), [LangServe](https://github.com/langchain-ai/langserve) and [LangSmith](https://smith.langchain.com/). +It is powered by [LangGraph](https://github.com/langchain-ai/langgraph) - a framework for creating agent runtimes. +It also builds upon [LangChain](https://github.com/langchain-ai/langchain), [LangServe](https://github.com/langchain-ai/langserve) and [LangSmith](https://smith.langchain.com/). OpenGPTs gives you more control, allowing you to configure: - The LLM you use (choose between the 60+ that LangChain offers) @@ -11,6 +12,16 @@ OpenGPTs gives you more control, allowing you to configure: - The retrieval algorithm you use - The chat history database you use +Most importantly, it gives you full control over the **cognitive architecture** of your application. +Currently, there are three different architectures implemented: + +- Assistant +- RAG +- Chatbot +- +See below for more details on those. +Because this is open source, if you do not like those architectures or want to modify them, you can easily do that! +
@@ -20,7 +31,6 @@ OpenGPTs gives you more control, allowing you to configure:
- [GPTs: a simple hosted version](https://opengpts-example-vz4y4ooboq-uc.a.run.app/)
- [Assistants API: a getting started guide](API.md)
-- [Memory: how to use long-term memory](MEMORY.md)
## Quickstart
@@ -176,6 +186,47 @@ The big appeal of OpenGPTs as compared to using OpenAI directly is that it is mo
Specifically, you can choose which language models to use as well as more easily add custom tools.
You can also use the underlying APIs directly and build a custom UI yourself should you choose.
+### Cognitive Architecture
+
+This refers to the logic of how the GPT works.
+There are currently three different architectures supported, but because they are all written in LangGraph, it is very easy to modify them or add your own.
+
+The three different architectures supported are assistants, RAG, and chatbots.
+
+**Assistants**
+
+Assistants can be equipped with arbitrary amount of tools and use an LLM to decide when to use them. This makes them the most flexible choice, but they work well with fewer models and can be less reliable.
+
+When creating an assistant, you specify a few things.
+
+First, you choose the language model to use. Only a few language models can be used reliably well: GPT-3.5, GPT-4, Claude, and Gemini.
+
+Second, you choose the tools to use. These can be predefined tools OR a retriever constructed from uploaded files. You can choose however many you want.
+
+The cognitive architecture can then be thought of as a loop. First, the LLM is called to determine what (if any) actions to take. If it decides to take actions, then those actions are executed and it loops back. If no actions are decided to take, then the response of the LLM is the final response, and it finishes the loop.
+
+
+
+This can be a really powerful and flexible architecture. This is probably closest to how us humans operate. However, these also can be not super reliable, and generally only work with the more performant models (and even then they can mess up). Therefore, we introduced a few simpler architecures.
+
+**RAGBot**
+
+One of the big use cases of the GPT store is uploading files and giving the bot knowledge of those files. What would it mean to make an architecture more focused on that use case?
+
+We added RAGBot - a retrieval-focused GPT with a straightforward architecture. First, a set of documents are retrieved. Then, those documents are passed in the system message to a separate call to the language model so it can respond.
+
+Compared to assistants, it is more structured (but less powerful). It ALWAYS looks up something - which is good if you know you want to look things up, but potentially wasteful if the user is just trying to have a normal conversation. Also importantly, this only looks up things once - so if it doesn’t find the right results then it will yield a bad result (compared to an assistant, which could decide to look things up again).
+
+
+
+Despite this being a more simple architecture, it is good for a few reasons. First, because it is simpler it can work pretty well with a wider variety of models (including lots of open source models). Second, if you have a use case where you don’t NEED the flexibility of an assistant (eg you know users will be looking up information every time) then it can be more focused. And third, compared to the final architecture below it can use external knowledge.
+
+**ChatBot**
+
+The final architecture is dead simple - just a call to a language model, parameterized by a system message. This allows the GPT to take on different personas and characters. This is clearly far less powerful than Assistants or RAGBots (which have access to external sources of data/computation) - but it’s still valuable! A lot of popular GPTs are just system messages at the end of the day, and CharacterAI is crushing it despite largely just being system messages as well.
+
+
+
### LLMs
You can choose between different LLMs to use.
diff --git a/_static/agent.png b/_static/agent.png
new file mode 100644
index 00000000..6394a3b7
Binary files /dev/null and b/_static/agent.png differ
diff --git a/_static/chatbot.png b/_static/chatbot.png
new file mode 100644
index 00000000..43537b40
Binary files /dev/null and b/_static/chatbot.png differ
diff --git a/_static/rag.png b/_static/rag.png
new file mode 100644
index 00000000..a388e64a
Binary files /dev/null and b/_static/rag.png differ