Discover how to create a persistent, context-aware AI chat interface using Python, OpenAI's GPT API, and a simple terminal-based user interaction. Learn the essentials of managing conversation history, controlling response creativity, and building an ongoing dialogue with AI.
Key Insights
- Create an ongoing conversation by repeatedly sending the entire chat history with each new question, ensuring the AI maintains context for accurate responses.
- Adjust the "temperature" parameter (ranging from 0 to 1) to control the AI's response randomness, with lower values (e.g., 0.01) producing factual and concise replies, and higher values (e.g., 0.9) generating more creative outputs.
- Set "max tokens" (for example, 4,000) to limit the length of the AI's response; this helps manage API costs and optimize performance, especially after the GPT-4.0 update in October 2024 reduced token pricing by approximately 33 to 50 percent.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
We're going to replace the route with our own function for chatting with the AI, you'll see. We're going to call the function repeatedly on a loop, and the reason for that is the chat consists of a back-and-forth, right? You say something, AI answers. You say something back, AI answers back.
Now, to maintain context, the AI can't forget what it just got asked. So, every time you send the prompt in question, you have to send the entire conversation. You can't just send the new question with no context.
It's not going to know what you're talking about. So, you ask question one, it answers with answer one. You send question two, you send it along with question one so that it gives you back answer two.
You actually send question two along with question one and answer one. Right? The AI's own answer is given back to it. So, you're constantly sending the entire conversation with every new question in the conversation, which is, you know, like with a human—you want to talk to someone who remembers what you said a minute ago.
Append each user chat input to the chat list. We're going to have a UI. Well, the UI is going to be typing in the terminal, but we don't have—there's no HTML.
So, you're going to be typing your questions in the terminal, and it's going to be answering in the terminal. That step, you know, that's—we just want this ongoing conversation. We're happy if we can get it going in the terminal.
After that, we'll turn our attention to getting the ongoing chat conversation running in the browser, where the user types a question into a box, hits enter, that gets sent. And it comes back, and the Jinja template pops it out onto the webpage. It's starting to get a little more complex now, definitely.
We're going to call the function, boom, boom, boom, boom, boom. Okay. We're going to make server4.py.
From server03.py, we're going to do a save as, call the new file server04.py. Close this stuff. File, save as, server04.py.
Replace the route with our own function. In other words, we're going to turn off this function declaration. So, rather than declare a function inside the route, we're going to call a function inside the route.
And the reason we're going to do that is because if we have our own function like so, see the route function as such will just run the one time when you hit the route. We don't have a way to call it again and again without hitting the route again and again. We need a function that we can call repeatedly inside the—at the route.
Replace the route function—the route and its function. Oh yeah. We're replacing the whole route, actually.
Yeah. And its function with our own custom function. We're not going to delete that.
We'll just keep it for context, but comment it out. Before chatting with the AI, we're going to make our own custom function for chatting with the AI.
Its parameter is going to be conversation list. So, it's going to take—every time the function is called, it's going to take the full conversation as a list, as its argument. And that way, it'll have the context.
So, then that gets sent as the prompt to the AI. And this list gets appended to as it gets longer as the chat proceeds. So, rather than delete the route code, let's just comment it out.
We have it as a study reference. Okay. We're getting deep into it now.
Turn that off. Let's just turn all this off for now. Everything except that required last line.
We'll just turn everything off. Okay. We're going to define a model.
Is that what we want to call it? def model. Sure. I don't know if I like that name.
No. We're going to call it chat_with_ai_model. Of course.
Let's call our function chat_with_ai_model. Its parameter takes as its argument a list of chat items. Parameter: conversation list.
Conversation list parameter takes as its argument the list of chat items—chat items, chat text, chat items, whatever—individual chat pieces, which get longer as the chat proceeds. Rather than delete the route code, comment it out to keep a study reference. Okay.
Right. We're going to chat with the AI model. Def, for define a function.
Chat_with_ai. Let's just say chat_with_ai. We get it.
It's a model. Chat_with_ai. Keep the try block and create method, but change the messages value to conversation list and add two optional parameters: temperature and max_tokens.
Explain below. Okay. We're going to keep the try thing.
Try response model. Same messages though now is set equal to the conversation list. That's the big diff.
Max_tokens =,000. That's how many tokens can be consumed. So, you're capping the length of the answer.
Temperature = 0.5 is a middle value between zero and one. So, here you go. These are explained—these new properties we haven't seen yet.
Adjust max_tokens for larger responses. Each time you request the AI model, it consumes tokens to process the input and output, as I mentioned at the very beginning of the course. In October 2024, with the release of GPT-4.0, AI tokens became 33 to 50% cheaper, which is very good news for AI Developers who must consider the cost of tokens consumed by their users.
The optional temperature property in the OpenAI API controls the randomness of the model's output. It affects how the model decides what word or phrase to generate next. Here's how it works.
So, a low temperature such as 0.01 is telling it to be extremely factual; just stick to the facts. Do not attempt to get creative on us. Do not get all creative. Pick common completions—factual, concise responses.
High temperatures such as 0.9 or one even, which is the max—the output becomes more creative and diverse, but less predictable as the model will explore more varied completions. Useful for creative writing or brainstorming.
If you don't set the temperature, the API will use the default, which is typically one, which is pretty creative. If you omit it, the model may provide outputs with more variety, but they might be less focused. For general chatbots or factual Q&A, setting the temperature lower is often preferred.
When to adjust: For deterministic output, low temperatures, 0 to 0.3. For creative tasks, higher temperatures, 0.7 to 1. We're splitting the difference. We want kind of something in between.
So, what we're going to do is return—the function is going to return the message, the text, the AI response text. So, the messages are conversation list. You remember, if you remember, right—messages consist of this kind of like AI and user here, right? And that would be the role and the content and the role and the content of AI, right? The AI role and content and the user role and content.
When we make our conversation list, we have to structure it like that. That's what it's expecting. We can provide multiple rounds of that, but that's what it wants.
Okay, so next we're going to set up a chat loop. At the end of the script—that is, actually after this if main thing—we're going to announce that the chat is on.
We're going to put this all getting printed in the terminal. We're not—no webpage and no browser even, for that matter—just terminal printout. We're going to be chatting with the AI in the terminal.
You are chatting with OpenAI. We're going to declare a list called chat_list and provide its first item, which is a dictionary defining the AI's role and content: "You are a helpful Assistant with a broad range of expertise so we could talk to it about anything."
And we're going to declare a Boolean set to true for repeating a while loop as long as the Boolean remains true. This is why it's kind of helpful to know some programming. Set up a while loop that runs as long as the Boolean is true.
We're going to say while chatting == True. So, as long as this Boolean is true, we're going to run this while loop. Each time the loop runs, prompt the user to type a message.
We're going to say user_chat_message = input("User: "), and that input method will provide an input to type in inside the terminal—a little user input box. If the user typed "quit" or "exit, " we're going to inform the AI that the chat is over using the dictionary format with role and content property for this.
We're going to say if user_chat_message.lower() is in this list—in other words, if it's the word "quit" or "exit"—we're going to append to the list this final message and literally tell the AI it's over: "Thanks for chatting. That's all for now."
And the AI will answer back like, "Okay, great. Thank you, " you know, whatever. And we're going to stop the loop.
It's going to end, and the AI will answer that "thanks for chatting with me" question and say something appropriate, like, "Anytime, happy to talk, and we can talk again anytime you like." It'll say something along those lines.
Let's write some of this stuff. Let's turn on the try thing again. Let's turn all this back on and chat with AI.
Why is it like that? Oh, it's a function. Okay. We're declaring a function, and the function takes an argument: conversation_list.
So, our response format is not really JSON. We're just passing this giant array back and forth. Just a list.
And the message is going to be conversation_list. And then we have two other things we want: max_tokens.
We'll set that to,000. And temperature—we'll set that to 0.5. Okay. Now, I mean, unusual, right? We've never done this.
We've never gone after this—this thing that we really just think of, this statement at the end, we just think of this as the end of everything. But you can actually run code here.
We're going to say print. And the reason we're running the code here is because we don't have a browser. We're not rendering to the browser, and we're not rendering a template.
No webpage, no browser output. All this is if you want to run stuff in the terminal down here. Print: "You are chatting with the OpenAI GPT-4.0 model."
We're going to declare a list called chat_list. We can do it right here. So, for context, we should probably show that we are still here.
This is the first—the initialization of the chat, basically. We're going to say chat_list = [ {"role": "system", "content": "You are a helpful Assistant with a broad range of expertise."} ]
We're going to then next declare a Boolean set to true. And that's going to be for repeating our while loop as long as the Boolean remains true. We're going to say chatting = True.
We're going to set up a while loop that runs as long as the Boolean is true. So, then we're going to say while chatting == True. Or, you can just say while chatting because chatting itself is either true or false—but it's more explicit and maybe easier to read if you check it with a double equal sign.
Each time the loop runs, we're going to prompt the user to type a message. That's this input("User: "). So, input—if you're in, say, Google Colab typing—input will give you an input box in the little output pane underneath your current code cell.
Here, it's going to give you—because we're running on the command line here—this is the application executed on the command line. Therefore, all this stuff is happening in the terminal, essentially on the command line. While chatting == True, we're going to provide an input and capture whatever the user types as the user_chat_message.
We're going to say user_chat_message = input("User: "), and maybe a little space. If the user typed "quit" or "exit"—okay, so we're going to suddenly outdent a little bit just to be practical here. If user_chat_message.lower() is found inside this list—so if it's either one of those two words—we're going to wrap things up in the list: ["quit", "exit"]
In other words, if the user types either one of those two words, we're going to take our chat_list and append onto it a finishing message of {"role": "user", "content": "Thanks a lot for chatting with me, basically. That's all for now."}
Thanks for chatting with me. Thanks for chatting. Oh, you can't do the line break.
Okay. There. Thanks for chatting with me.
Let's make it a little easier to read. Flip the Boolean at that point as well because you're done, right? You can say "not chatting, " but, you know, flipping it just means setting it to False. In other words, you could flip it without knowing what it is, but stop the while loop, right? Because the while loop won't run anymore because it's only running while chatting is True.
And you're only setting it to False if this is True—if it's true that the user typed "quit" or "exit." If the user did not quit, the chat is ongoing, so append the user's latest input to the chat. So, let's say they didn't type "quit."
So, this would be the else part. You don't even need to say else, though. We'll just come out of the if.
The if will run if it runs. Say else? I don't think so.
Not really. All right. We'll just say chat_list.
Chat_list.append. And now we want to append the actual message. You know what? We could tighten this up.
It's getting long. A little bit hard to read when it gets so long. It's hard to see enough.
You want to see a little bit more code on the screen at a time. Okay. That's why.
Sometimes all the line breaks and everything—which are meant to make it easier to read—make it a little harder to read because it prevents you from seeing enough code to have the full context. Okay. If the user did not type "quit" or "exit, " add their message to chat_list.
Did not type "quit" or "exit." I'm going to say type. Add their message.
Okay. Call our function and handle the AI response. Now, the function—
Well, let's go back and look at the code. The function does all this stuff. The function sends the conversation list to the AI, right? This is our request.
We've done this many times. It handles the response, which we don't need here. We don't want this part.
The ongoing chat. So, this is the response text from the AI. So, next what we're going to do is call our function and send our chat_list.
We're still in the while loop, by the way. Send our list of dictionaries to the AI. Setting the function call equal to a variable will store the return value of the function, which is the AI response text.
So, we're going to say AI_response_text = chat_with_ai(chat_list). We're still inside this loop now. Calling the function chat_with_ai(conversation_list).
And we're saving the result as AI_response_text. Call the function which sends conversation to the AI. Oh, this is going to be our chat_list, right? We're passing the chat_list.
So, the parameter is conversation_list, but the argument we're passing in doesn't have to be called the same thing. In fact, we're calling it chat_list. We could make them the same, but I wanted to make sure.
I deliberately made them different so we know. We don't think that the parameter and argument are the same. The parameter refers to when you define the function.
So, where are we here defining the function? That's the definition of the function with the def keyword. Conversation_list is the parameter. It gets the value set when you call the function.
And at that time, that value is called the argument. So, the chat_list argument sets the value of conversation_list in the function. And it's the conversation_list coming in as chat_list that is the message being sent to the AI.
And the AI will see this list of Q&A, basically—all these role-content lists, dictionaries. And it will have the context to understand that this is an ongoing conversation.
And it will look at the new question, the old questions, and its responses along the way. And then the large language model will do its thing, which is to pick out the best word one after another with that level of context. All right.
Simplify this a little. I don't want you to think that these—well, those curly braces are for string formatting, concatenation—not Jinja, right? Those aren't double curly braces. We're not in a webpage.
We're going to print the AI response. The user input will print automatically because every time you hit ENTER—right, you type input, hit ENTER—it's going to show up. So, by printing the AI response, we will get the full conversation in the terminal.
So, the next thing we're going to do is print the AI response following each user chat message. So, we're going to print something like "AI Bot: " + AI_response_text. Let's do a regular concatenation rather than wrap AI_response_text in curly braces.
I think I don't want to confuse about the curly braces. These are not templating curly braces. These are just string concatenation or template literal curly braces.
All right. So, no f formatting. No f. And then we don't have this app.run thing.
I don't believe, because there is no app thing, right? Look, we turned it off. We have no JSON. We can take this out.
We have no JSON. We have no render_template. Leave that as a reminder of what's up.
We don't have this at all. We're not returning render_template or anything. The function—okay.
The function does need to return something, of course. So, we're returning here. I missed a move.
Chat_with_ai. Still in a while loop. But the function needs to return a value, right? Of course, it does.
All functions—these functions need to return values. All right.
Just consult the final file. Take a peek. Try not to do that at all.
But, sure. Of course, it always returns the same thing, right? Bare minimum, you're returning—this is going to be—you’re just returning no variable. It's going to be the response, right? The whole thing is set equal to a response, right? And then you just return the response.
Response.choices[0].message.content. Yep. This is what you get back every time. And we're just returning it.
Now, when you call the function—right, the function returns the AI's answer. We've seen this move many times. That means when you call the function, that becomes a return value.
Save the return value, right, as AI_response_text. Right? That's what you're getting back. So, as you go, as you continue—as this while loop is running here—all happening in the terminal—all this happens in the terminal. This chat takes place in the terminal, not on a webpage.
We don't have an interface on a webpage to do that, right? We would need input boxes on a webpage, a button or something to submit the chat message, perhaps some JavaScript at that point to run a function that calls—that sends a fetch request to the server, to this route, and so on. So, we're not doing that yet. We're just running this here in the console.
I know that's kind of hard to fathom here. Let's see if it works. So, we're going to run server4.py. "You're chatting with the OpenAI GPT-4.0 model."
Okay. "In Pulp Fiction*, what is Brad eating for breakfast?" In Pulp Fiction*, you're welcome. If you have a question and need assistance—ooh, ooh, ooh—it thinks I said we're done.
Okay, bad. If `user_chat_message.lower()` is in there, we do that. We don't want that though.
No, it's not. `chatting = True`. Oh, oh yeah.
We also have to take—whenever the AI responds—add the AI's response answer to the chat list. It has to be a bilateral chat, right? Can't just be one party. So, that's going to be `chat_list.append(…)` with the AI response text.
We have to do it as `{"role": "assistant", "content": AI_response_text}`. Actually, we want to call it assistant**. And do we need this app debug? `app.run` is really—yeah, okay.
There's no route, but we still want to run the app anyway. All right. Let's see if that fixed it.
Well, we're in the terminal anyway. Okay. Let's just start over.
*Pulp Fiction*, what is Brad eating for breakfast? Oops. It still thinks I'm quitting. Message.
Why would it think my message is in here? It's not. Why would it think that? It shouldn't think that. It's basically assuming or thinking that I said, "That's all for now, " right? Because it's saying, "Okay, great."
"You're welcome." All right.
But that's not what we're doing. Oh, it doesn't like the input. Why should it not like the input? What's wrong with the input? Nothing.
`chatting = True`. While it's true, `user_chat_message = input(…)`. Why was it flagging that? It doesn't like the input.
I don't understand, but, you know, you've seen me in action. I always debug. Sometimes it just takes a little while.
That's par for the course. Why? Let me try another one. Let me try another thing.
"What are two ingredients of tomato soup?" I don't know. It doesn't understand, but we're trying to talk. Why would this be true? It's only going to add that to the chat list if this statement is true.
And let's do something that's obviously false. `if 5 < 2 + 2` which is totally false, run that. Now, if that runs when that's obviously false, then I don't know.
We'll see. "Is a tomato a fruit or a vegetable?" Oh, boy. Weird, weird, weird.
So, this is always running. How about turn this off? Okay. How about this—no if at all.
Just—oh, oh, oh, oh, oh, oh. I get it. I see what's wrong.
I see what's wrong. See? See what's happening here? I'm saying, "That's all, " here as well. No, no, no, no.
No wonder. Right? I'm saying, "That's all for now, " both times. That's what's supposed to run if I didn't say "quit."
But I'm saying the quit statement both times because I copy-pasted it. All right. I get it.
Fine. Okay. That fixed it.
That'll fix it. "What is Brad eating for breakfast?" In Pulp Fiction*, Brad is eating a Big Kahuna Burger for breakfast when Jules and Vincent come to his apartment. Jules comments on it being a tasty burger and even takes a bite himself.
"Does Vincent also drink some of Brad's breakfast?" Yes, Vincent drinks some of Brad's Sprite. After Jules takes a bite of the Big Kahuna Burger, he asks Brad if he can have a sip of his beverage to wash it down—his tasty beverage.
Yep. "Does Vincent take a polite sip?" No, Vincent does not take a polite sip. He takes a long drink, finishing up most of the Sprite, which adds to the intimidation of the scene.
Yeah. Okay. "Quit."
"Have a great day." Look at that. We're chatting right on the command line about Pulp Fiction with the OpenAI API.
Pretty cool. Yep. I mean, you're officially a cool kid now if you can do that.
Absolutely. Let's make sure. There.
Yeah. `user_chat_message`. Okay.
And we have to add something. We have to add here. We have to add the AI's response.
I don't know if that's—do I have that? Yeah, we do. The ongoing chat consists of user input followed by AI_response_text. So, append the AI's response to chat_list.
Yep. Instead of referring to the model as system, we can call it assistant since the AI has been informed, "You are a helpful Assistant." There it is.
Print the response. Yep. Run the chat app.
Yep. The user prompt should appear in the terminal. All right.
We want this. I don't. Okay.
Yep. Right here. Let's get a nice screenshot here.
I'm going to get just enough of the Pulp Fiction thing. There it is. We don't need to see too much of the chat.
I would like to see. "Thanks for chatting." Let's shorten this.
Okay. A glorious screenshot is absolutely in order for this.
Let's make this smaller. Whoops. After a pause—obviously there's a pause.
Yep. Okay. All right.
That is pretty sweet. Final code. Maybe that's fine.
Chat_with_ai. OpenAI. We don't need render_template.
Conversation. Max tokens. I have it the other way.
Doesn't matter. Extract the message content from the response. Return.
Response. Return. Type "quit" or "exit."
Start the conversation by defining the AI's role: "You are a helpful Assistant." `chatting = True`.
Repeat a loop as long as `chatting`. Chatting Boolean. Boolean.
Ending the while chat loop. While `chatting == True`, prompt user: "Enter message."
That's you. Are we saying "you"? We don't want to say "user, " right? Absolutely. It's "you."
Let's just return regular concatenation. We don't need to do string interpolation when it's so short. Okay.
"You are a helpful Assistant." `chatting = True`. While `chatting == True`, prompt the user.
All right. This all looks good. I'm going to shrink.
For this kind of stuff, I'm going to go one font size smaller. Chat_with_ai(conversation_list). While it's true, if `user_chat_message` is here, we're going to set the Boolean to False to end the loop.
Inform AI the chat is over, right? Tell it's over. That's all. Thanks for chatting.
The user did not type "quit" or "exit." Append their message to the conversation list. All right.
Call the function to send the updated chat to the AI. Store the return value of the AI's reply in a variable. Store the AI's answer in the `AI_response_text` variable.
Yep. Chat_with_ai. It's not model.
Add the AI Assistant's response to the conversation. Append: {"role": "assistant", "content": AI_response_text}.
With the response, just do straight-up normal concatenation. No template literal again.
I don't want to wrap the variable in curlies if I don't need to. Because that's the double curlies of the Jinja template. You also have curlies for all these dictionaries.
So, there you go. There is no HTML page. So, we're good.