By Philipp Zimmermann in Large Language Models — Apr 25, 2024

Build your own LLM Agent from Scratch: A Step-by-Step Guide

AI generated image by DALL·E

When I first experimented with large language models in 2022 (the year ChatGPT was released to the public) I wrote a personal assistant that was capable of obtaining the current time or searching something online. The code was ugly, but it worked and I didn't know that I build something that today is called an agent.

In this blog post we will implement our own LLM Agent from scratch using Python and Ollama. At the end you will understand how agents work and why they are so powerful.

Digression: M.A.I.A.

Based on this blog post I decided to push a bit further and I built my own OpenInterpreter.

0:00

/0:40

M.A.I.A. Showcase

Check the code out if you want:

Install Ollama

We will start by installing Ollama which is a piece of software that allows us to deploy and use Large Language Models locally on our machine. We will download Ollama using their website. Just choose your OS, download and install it.

Download Llama-3

After we installed Ollama we now need to download some models we want to use. This is a list of all supported models Ollama has to offer but for this tutorial we will use llama3:instruct. Just run the command below to download it.

# openchat
ollama run llama3:instruct

LLM Agents

As large language models are not updated on a daily basis, their knowledge base only extends up to a certain point in time. That means, that they have no access to current events or other information that were not included in their training data.

This is a huge problem, since for most use cases we want the LLM to access most recent data like stock prices, latest events in politics etc. To solve this problem we want to give a LLM the opportunity to use tools like browsing the web or using APIs to obtain up to date information.

And that's what an agent is. An agent is able to use specific tools to obtain information it needs to fulfill user requests. Let's take a look at the following example:

Regular LLM - Let's think about asking this model for the latest stock price of Google. Since this LLM was trained on a specific data set it most certainly will hallucinate an answer to this question. Simply because it has no idea how to answer this question because of missing data.
LLM Agent - When asking the agent the same question it will use the Web-Search tool to search online for the current stock price of Google. It then presents the correct answer based on the search results, just like human.

Now that we know what LLM agents are, let's build our own from scratch. Therefore, we will use Python and Llama-3 the latest model from Meta AI.

Implementation

To run the agent we must install some packages. Therefore, we will first create a new virtual environment like this:

# Create working directory
mkdir llm-agent

# Change directory
cd llm-agent

# Create a virtual environment
virtualenv -p python3 env

# Activate environment
source env/bin/activate

Now we can install all requirements like this:

# Install python packages
pip install ollama google-api-python-client py-expression-eval

At this point you need to create a programmable search engine following these instructions. After that you obtained a GOOGLE_CSE_ID and a GOOGLE_API_KEY.

The following code is the complete agent. You can just save it into a file and run it.

#!/usr/bin/python3

import ollama
from googleapiclient.discovery import build
from py_expression_eval import Parser
import re, time, os
import json


class Agent:

    def __init__(self):
        """
        Initialize Agent.
        """
        # Google Search
        self.google_cse_id = "GOOGLE_CSE_ID"
        self.google_api_key = "GOOGLE_API_KEY"

        # Calculator
        self.parser = Parser()

        # System Prompt
        self.agent_system_prompt = """Answer the following questions and obey the following commands as best you can. You must always answer in JSON format.

You have access to the following tools:
Search: useful for when you need to answer questions about current events. You should ask targeted questions.
Response To Human: must be used if you do not want to use any tool.

You will receive a message from the human, then you should start a loop and do one of two things

Option 1: You use a tool to answer the question.
For this, you should use the following format:
{
    "thought": "you should always think about what to do",
    "action": "the cation to take, should be one [Search, Calculator]",
    "action-input": "the input to the action, to be sent to the tool"
}
After this, the human will respond with an observation, and you will continue the loop.

Option 2: You respond to the human.
For this, you should use the following format:
{
    "action": "Response To Human",
    "action-input": "your response to the human, summarizing what you did and what you learned"
}

You may use tools a maximum of two times in a row. If you used a tool the second time in your loop, the next step must be answering the human.

Option 2 ends your loop, so if you have no tool to use you still must awnser in JSON format like stated in Option 2. This is mandatory.

Begin!
"""

    def extract_json(self, message):
        """
        This function extracts the JSON content from a given message.
        """
        match = re.search(r'{.*?}', message, re.DOTALL)
        
        if match:
            return match.group(0)
        else:
            return '{"action": "Response To Human", "action-input": "Internal Error occured"}'
    
    def search(self, search_term):
        """
        This function searches the internet for the given search term.
        """
        search_result = ""
        service = build("customsearch", "v1", developerKey=self.google_api_key)
        res = service.cse().list(q=search_term, cx=self.google_cse_id, num = 10).execute()
        for result in res['items']:
            search_result = search_result + result['snippet']
        return search_result
    
    def no_tool(self, nothing):
        """
        This function is the no tool selection function.
        """
        return ""
    
    def query_agent(self, message):
        """
        This function queries the agent with a given message.
        """
        # Add user message to conversation
        self.messages.append({"role": "user", "content": message})

        # Ollama options
        options = {
            "temperature": 0,
            "num_predict": 4000
        }

        # Agent loop
        while True:

            # Query agent
            raw_response = ollama.chat(model="llama3:instruct", messages=self.messages, options=options)
            response_text = raw_response["message"]["content"]
            response_json = json.loads(self.extract_json(response_text))

            # Extract action and tool input
            try:
                action, action_input = response_json["action"], response_json["action-input"]
            except:
                action, action_input = "Nothing", "Nothing"

            # Tool selection
            if action == "Search":
                print("[ Search ]")
                tool = self.search
            elif action == "Response To Human":
                self.messages.append({"role": "assistant", "content": action_input})
                return action_input
            else:
                tool = self.no_tool
            
            # Tool usage
            observation = tool(action_input)
            self.messages.extend([
                {"role": "assistant", "content": f"Please use the {action} tool to obtain more information."},
                {"role": "user", "content": f"You asked me to use the {action} tool to obtain more information. Here is my observation from the tool. Please use it and continue your loop.\n\nObservation: {observation}"}
            ])
    
    def run(self):
        """
        This function runs the agent.
        """
        self.messages = [
            {"role": "system", "content": self.agent_system_prompt}
        ]

        while True:

            # Get user input
            user_input = input("You: ")
            print()

            # Exit program
            if user_input == "exit":
                break

            # Query agent
            response = self.query_agent(user_input)

            # Print response
            print(f"\nAgent: {response}\n")

            # Save conversation
            json.dump(self.messages, open("conversation.json", "w"), indent=4)

if __name__ == '__main__':

    # Run Ollama Agent
    agent = Agent()
    agent.run()

agent.py

So let's break this down step by step.

System Prompt

The system prompt is one of two essentials in this code snipped. It starts by telling the LLM that it can use specific tools throughout its thinking process to obtain tool specific information. After that it introduces a simple idea:

Let the LLM perform an internal loop of gathering as much information as needed to perform the given task. After that it will respond to the user.

This is done by providing two options of how the LLM can behave. Option 1 is recursive. That means that after successfully using one tool it can evaluate again wether or not to use another tool to obtain even more information. Or it can decide to go with Option 2 which means exiting the loop by answering the human.

We also ensure that the LLM is answering in JSON format which makes it easier to parse the responses later.

Search

The method search(self, search_term) is our only tool, the agent can choose to use. It takes one argument search_term which will be used to query the Google API. The result is a bunch of text obtained from the search engine which is then returned by the method.

Main Loop

The conversation process is implemented in the run(self) method. It starts by initializing the conversation with the system prompt for the LLM. Then the user can make an input and the agent response is printed to the terminal. This process continues until the user types in exit which will stop the program.

Agent Logic

Now let's discuss the second essential besides the LLM system prompt - the agent logic in query_agent(self, message). In this method we implement the inner loop that we talked about in the system prompt section. We first add the user message to the conversation and then start the loop.

Being queried the first time, the model is prompted with the original user message. E.g. "Does Barack Obama have kids?".
The agent will analyze this query and decides to use the Search tool for answering the question.
At this point we add a assistant message to the conversation which says "Please use the Search tool to obtain more information". This step is crucial since it keeps the original conversation flow between user and assistant accurate. However, it's not the human, who answers to this question rather then the Search tool. Because, now we append the user message to the conversation containing the observation from the tool that has been used.
Investigating the conversation, it looks like the user asked a question, the assistant asked the user to provide more information using a specific tool and the user provides the requested information. However, this is done internally and without the human noticing.
Now the agent can decide again to use a tool or to stop the loop and respond to the human.

This process is hard to understand the first time, so take your time thinking about this a while. To make it easier to understand let us execute the script and have the following conversation.

# Run the script
python3 agent.py

You: Does Barack Obama have kids?

[ Search ]

Agent: Barack Obama has two daughters, Malia and Sasha, with his wife Michelle Obama.

You: When was Malia Obama born?

[ Search ]

Agent: According to my observation from the Search tool, Malia Obama was born on July 4, 1998.

What we can see is that in both cases the agent choses to search the web to obtain the information. Then it uses the information to build up its answer.

Now let us take a look under the hood:

[
    {
        "role": "system",
        "content": "Answer the following questions and obey the following commands as best you can. You must always answer in JSON format.\n\nYou have access to the following tools:\nSearch: useful for when you need to answer questions about current events. You should ask targeted questions.\nResponse To Human: must be used if you do not want to use any tool.\n\nYou will receive a message from the human, then you should start a loop and do one of two things\n\nOption 1: You use a tool to answer the question.\nFor this, you should use the following format:\n{\n    \"thought\": \"you should always think about what to do\",\n    \"action\": \"the action to take, should be one [Search, Calculator]\",\n    \"action-input\": \"the input to the action, to be sent to the tool\"\n}\nAfter this, the human will respond with an observation, and you will continue the loop.\n\nOption 2: You respond to the human.\nFor this, you should use the following format:\n{\n    \"action\": \"Response To Human\",\n    \"action-input\": \"your response to the human, summarizing what you did and what you learned\"\n}\n\nYou may use tools a maximum of two times in a row. If you used a tool the second time in your loop, the next step must be answering the human.\n\nOption 2 ends your loop, so if you have no tool to use you still must awnser in JSON format like stated in Option 2. This is mandatory.\n\nBegin!\n"
    },
    {
        "role": "user",
        "content": "Does Barack Obama have kids?"
    },
    {
        "role": "assistant",
        "content": "Please use the Search tool to obtain more information."
    },
    {
        "role": "user",
        "content": "You asked me to use the Search tool to obtain more information. Here is my observation from the tool. Please use it and continue your loop.\n\nObservation: President Barack Obama, First Lady Michelle Obama, and daughters Malia and Sasha pose for a family portrait with Bo and Sunny in the Rose Garden of the White\u00a0...His immediate family includes his wife Michelle Obama and daughters Malia and Sasha. The Obama family. The Obama family on Easter Sunday, 2015. From left to\u00a0...Jan 19, 2024 ... Things did work out: Malia and Sasha were 8 and 10 at the time, respectively, when they moved into the White House. They lived there throughout\u00a0...Honolulu, Hawaii, U.S.. Political party, Democratic. Spouse. Michelle Robinson. \u200b. ( m. 1992)\u200b. Children.Jan 17, 2018 ... 4M likes, 102K comments - barackobamaJanuary 17, 2018 on : \"You're not only my wife and the mother of my children, you're my best friend.Feb 4, 2009 ... The decisions that no parent should ever have to make \u2013 how long to put off that doctor's appointment, whether to fill that prescription,\u00a0...Portrait of Barack Obama, the 44th President of the United States ... President Barack Obama and First Lady Michelle Obama pose with their daughters, Malia and\u00a0...Nov 20, 2014 ... President Barack Obama signs S. 1086, the ... Improve the quality of child care by ... Help working parents that receive subsidies to pay for child\u00a0...The previous version of the law, the No Child Left Behind (NCLB) Act, was enacted in 2002. NCLB represented a significant step forward for our nation's children\u00a0...May 25, 2022 ... As we grieve the children of Uvalde today, we should take time to recognize that two years have passed since the murder of George Floyd\u00a0..."
    },
    {
        "role": "assistant",
        "content": "Barack Obama has two daughters, Malia and Sasha, with his wife Michelle Obama."
    },
    {
        "role": "user",
        "content": "When was Malia Obama born?"
    },
    {
        "role": "assistant",
        "content": "Please use the Search tool to obtain more information."
    },
    {
        "role": "user",
        "content": "You asked me to use the Search tool to obtain more information. Here is my observation from the tool. Please use it and continue your loop.\n\nObservation: Malia Obama and Sasha Obama; 1.3 Marian Robinson. 2 ... Paternal grandfather to Barack Obama, he was born Onyango Obama. ... Fraser Robinson Jr. George Obama\u00a0...Jul 4, 2021 ... Happy Fourth of July, and Happy Birthday, Malia! It's been such a joy to watch you begin to make your own way in the world with poise,\u00a0...Jul 4, 2022 ... Malia Obama, the elder of two daughters, turned 24 on Monday \u2014 and as usual, her parents didn't skip a beat to celebrate \"one of the two most\u00a0...Jul 5, 2016 ... President Barack Obama hugs his daughter Malia on her birthday during an Independence Day Celebration for. Video Ad Feedback.Jul 4, 2023 ... Barack and Michelle Obama wished their eldest daughter Malia a happy 25th birthday on social media, sharing throwback photos with her as\u00a0...Jun 15, 2021 ... Born in Chicago in 1998, Malia Obama moved to the White House after her father, Barack Obama, was elected U.S. president in 2008.Malia Obama. Writer: The Heart. Malia Obama was born on 4 July 1998 in Chicago, Illinois, USA. She is a writer and director, known for The Heart (2023),\u00a0...Jul 4, 2022 ... Michelle and Barack Obama celebrate Malia Obama's 24th birthday ... It's Malia Obama's birthday! Former first lady Michelle Obama took to\u00a0...Jul 4, 2022 ... Michelle and Barack Obama celebrated their daughter Malia Obama's 24th birthday.Their daughters, Malia and Sasha, were born in 1998 and 2001, respectively. Obama was elected to the Illinois Senate in 1996, and then to the U.S. Senate in\u00a0..."
    },
    {
        "role": "assistant",
        "content": "According to my observation from the Search tool, Malia Obama was born on July 4, 1998."
    }
]

conversation.json

This JSON file shows the conversation between me and the agent. But it also includes the internal steps the agent performed to obtain the information needed to answer my questions.

With this insights we can now see whats happening behind the scenes. As I said earlier, it is important to maintain the structure of the conversation which is:

System
User
Assistant
User
Assistant
...

This is because these models have been finetuned on data that follows this pattern. So for our agents to work well, we need to follow this pattern as well.

Conclusion

Thank you for reading this article I hope you had fun doing so. If you are interested in more interesting stuff e.g. the security of Large Language Models feel free to check out this blog post.