Back to blog

Adding Long-Term Memory to a GPT-3 Based Chatbot

Indexing conversations between users and bots can drastically improve the user experience and the quality of chatbot responses.

DemoSemantic search

It's an exciting time in the world of natural language processing, and specifically, we're seeing a tremendous amount of innovation within the chatbot and personal assistant space. This has been following the recent developments in natural language processing, and more specifically, large-language models (LLMs). Language models, such as OpenAI's GPT-3, are extremely flexible, powerful tools for building these sorts of systems. In this blog post, we'll go in depth on how to build powerful, high-quality chatbots using GPT-3, and how you can use Operand to further enhance the quality of these systems by adding long-term memory.

For those who want to skip right to the code, here's all of it.

Let's start with a basic refresher on how to build chatbots with modern language models, i.e. GPT-3. We can use one of the basic prompt templates that OpenAI provides us with to get started:

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: What's your name?
AI:

As you can see, we're essentially asking GPT-3 to predict what the AI would say given a) a description of the AI assistant itself, and b) the conversation history between the AI and the user. On its own, this prompt is incredibly powerful (and can arguably pass some versions of the Turing Test).

Interestingly, as the conversation goes on, GPT-3 is able to "remember" what the user said previously because we pass this data into the prompt itself, essentially allowing the language model to simply continue on the existing conversation. However, a problem emerges: we cannot simply pass in the entire message history between the AI assistant and the user into the GPT-3 prompt. It would be a) too expensive, b) too slow, and c) would result in an API error, since GPT-3 prompts can only be, at max, 2048 tokens (~8192 characters) long. Therefore, we need to be a bit more clever as to how we pass in context to the prompt.

To get around the "context window" limitation, we can limit our immediate context to the last few messages exchanged between the user and the AI assistant (code).

// Fetch the last few messages from the conversation. This gives the bot some immediate
// context to use when generating a response. Experimentally, usually keeping this to a moderate
// size is best (we also don't want the prompts getting too big).
previous := history.LastN(5)
for _, m := range previous {
	if m.Direction == MessageDirectionInbound {
		builder.WriteString("Human: ")
	} else {
		builder.WriteString("AI: ")
	}
	builder.WriteString(m.Text)
	builder.WriteString("\n")
}

Essentially, we're taking a sliding window approach and only prompting the language model with the immediate context that it needs in order to generate reasonable responses to the users' messages. This solves the problem with context window limitation (i.e. we will always be comfortably under the limit of 2048 tokens), though introduces a new problem: the lack of long-term memory.

... omitted

Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: What's your name?
AI: My name is "AI". What's yours?
Human: My name is Morgan.
AI: Cool, hey Morgan!

You can think of each GPT-3 invocation as a seperate, independent function call. It can only use data that you pass in, meaning if in the case above, previous conversations between the user and the assistant (i.e. those before "Hello, who are you") are ignored (or "forgotten" in a sense). We can use longer context windows by passing in more previous messages, but this comes with the drawback of higher latency and more expensive bills at the end of the month.

We can get around this limitation by using search, or more specifically, semantic search to find the most relevant messages from the entire conversation history with a user and pass it into the prompt. This combines the best of both worlds, you get the immediate context of the conversation (by passing in the immediately prior messages) as well as any of the context from previous conversations that the user has had with the chatbot. For example, if the user tells the AI assistant their birthday, or what their favorite coffee drink is, the assistant should be able to "remember" this and bring it up in conversation.

An example conversation using semantic search to include longer-term context in a GPT-3 prompt:

You: How are you?
Bot: I'm doing well today. How can I help you?
You: You can't! Did you know that my favorite coffee beverage these days is oat milk lattes?
Bot: I did not know that. Thank you for sharing.
You: Busy day?
Bot: Yes, I've been quite busy today. Thanks for asking!
You: What've you been up to?
Bot: I've been helping people with their inquiries and doing my best to be helpful.
You: Is that what you do?
Bot: Yes, I enjoy helping people.
You: Why?
Bot: Because it's rewarding to know that I can be of assistance.
You: That's pretty admirable!
Bot: Thank you, I appreciate your feedback.
You: Btw, do you remember what my favorite coffee beverage was?
Bot: Yes, you said that your favorite coffee beverage is oat milk lattes.

In this example, we're using an immediate context window of 5 messages (meaning the last 5 messages in the conversation are always passed in) in addition to including semantic search results in the prompt itself, using the incoming message as the query.

As a bit of background, semantic search is a mechanism for content-based search, i.e. for finding text which is "semantically related" to another piece of text. Specifically, we're looking for messages from the conversation history that have similar ideas or content to the incoming messages, not the same keywords (which is why full-text search wouldn't perform super well for this use case).

Realistically, a (slightly shortened) prompt including long-term memory would look like:

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Relevant previous messages from Human:
- You can't! Did you know that my favorite coffee beverage these days is oat milk lattes?

The conversation goes as follows:
Human: Why?
AI: Because it's rewarding to know that I can be of assistance.
Human: That's pretty admirable!
AI: Thank you, I appreciate your feedback.
Human: Btw, do you remember what my favorite coffee beverage was?
AI:

Notice how we include both the longer-term context (i.e. top semantic search results) in addition to the immediate context of the conversation. This keeps the overall prompts short (which means lower latency and cost), and maintains and/or improves the quality of the conversation drastically.

We may be a little biased here, but we believe that Operand is the best way to implement semantic search in your application. To do so in case of long-term memory, only small code changes are required.

First, whenever a message is exchanged between the bot and the user, we add it to an Operand collection (code):

// Create a new object for the message.
obj, err := mh.Client.CreateObject(ctx, operand.CreateObjectArgs{
	ParentID: operand.AsRef(mh.CollectionID),
	Type:     operand.ObjectTypeText,
	Metadata: operand.TextMetadata{
		Text: message.Text,
	},
	// We add a "direction" property to this object, which allows us to properly
	// scope searches in the future to messages from either the user or the chatbot.
	Properties: map[string]any{
		"direction": string(message.Direction),
	},
})
if err != nil {
	return err
} else if err := obj.Wait(ctx, mh.Client); err != nil {
	return err
}

Next, when constructing our prompt, we do a semantic search to fetch the top N (5 in this case) pieces of content from the conversation to include in the prompt (code):

response, err := operandClient.SearchContents(ctx, operand.SearchContentsArgs{
	ParentIDs: []string{operandCollectionID},
	Query:     message,
	Max:       5,
	// Use a filter to scope the search to only messages sent by the user, i.e. inbound messages.
	Filter: map[string]any{
		"direction": string(MessageDirectionInbound),
	},
})
if err != nil {
	return "", err
}

// If we got any results, we include them in the prompt as additional context.
if len(response.Contents) > 0 {
	builder.WriteString("Relevant previous messages from Human:\n")
	for _, c := range response.Contents {
		builder.WriteString(fmt.Sprintf("- %s\n", c.Content))
	}
	builder.WriteString("\n")
}

The idea of a virtual assistant, chatbot, with long-term memory is frankly still largely unexplored. As a bit of background, Operand as a company spent the first ~8 months of its life focused on building next-generation personal assistants, and semantic search (specifically, long-term memory) was a huge part of that (for long-term memory, changing settings / preferences, and more). We're really excited to see what the future holds for next-gen chatbots, and be part of building that future ourselves.

If you want to try this out for yourself, or simply see the code that went into this demo, you can take a look at the Github repository. We've included all the code you need to get started. As always, if you have any questions, feedback, or are looking for a little more of a custom solution, let us know!