Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOT READY TO BE REVIEWED] Unit 1, Part 1, big updates #27

Merged
merged 8 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes.
109 changes: 0 additions & 109 deletions units/en/unit1/1_definition_of_an_agent.md

This file was deleted.

109 changes: 109 additions & 0 deletions units/en/unit1/1_definition_of_an_agent.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# What is an Agent?

Since you are interested in learning more about **Agents**, here is the moment to discuss the fundamental question: **what is an Agent?**

To explain this concept, let's start with an analogy.

## The Big Picture: Alfred The Agent

Meet Alfred. Alfred is an **Agent**.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/this-is-alfred.jpg" alt="This is Alfred"/>

Imagine Alfred receives a command, such as: "Alfred, I would like a coffee please."

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/coffee-please.jpg" alt="I would like a coffee"/>

Because Alfred **understands natural language**, he quickly grasps our request.

Before fulfilling the order, Alfred engages in **reasoning and planning** to define the step of actions he needs to make, and which tools to use.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/reason-and-plan.jpg" alt="Reason and plan"/>

Once he has a plan in mind: go to the kitchen -> select the tool to make the coffee -> bring the coffee back to us. **Alfred must act**.

To execute his plan, **he can to use tool in the list of tools he has at his disposal**. In this case, to make a coffee, he uses a coffee machine. He activates the coffee machine to brew the coffee.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/make-coffee.jpg" alt="Make coffee"/>

Finally, Alfred brings the freshly made coffee to us.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/bring-coffee.jpg" alt="Bring coffee"/>

And this is what is an Agent: an AI model capable of reasoning, planning, and interacting with its environment. We call it Agent because it has the agency (interact with the environment).


<!-- Image that Summarize the process -->


## Let's go more formal

Now that we understood the big picture of what is an Agent, let's formally define it as follows :

> An Agent, a type of system that gives an AI model the ability to interact with its environment to fulfill a user-defined objective.s

In essence, Agents are designed to enhance the capabilities of AI Models by incorporating them in a framework managing things like **Planning**, **Memory**, and **Actions**.

To make things more visual, the AI model can be seen as the **brain of the agent** and the framework as the **remaining body parts**. The AI model does the reasoning and will then send "Action" to execute. And the scope of what is possible depends on what the model has been equipped with by it's creator. A human by not having "wings" can't execute the **Action** "fly" while it is possible to execute the **Actions** "walk", "run" ,"jump", "grab" and so on.


## What type of AI Models do we use for Agents?

The most commonly AI model found at the core of an Agent is an LLM (Large Language Model), this is a kind of AI model that takes **Text** as an input and Also output **Text**.

It's most known represents are **GPT4** from **OpenAI**, **LLama** from **meta**, **Gemini** from **Google**, etc... Those models have been trained on a very vast amount of text and are able to generalize well. But we will learn more about LLMs in the next section.

LLMs only handling **Text**, if the use-case require other modalities (Images, Audio, Video, ...), you will have to use different AI models. For instance, to browse the web, you could use a Vision Language Model (VLM) as the Agent's core that understands both Images and Text to navigate your web page.

A second example, of that could be to use **Whisper**, a very famous **Audio** to **Text** model as a "Tool" to allow your LLM agent to process audio into text in order to understand it.

## How does an AI take action on it's environment ?

The general word for this set of possible action that an AI model can use is a "Tool". For instance by default, your LLM can't generate any images. But if you ask some well-known chat application like HuggingChat, ChatGPT or Le Chat, to generate an Image, they can do it !

The model at the core of those application does not natively have the capacity to generate an Image. But the developpers of those applications created some code (Tools), that the LLM can call and execute to create an Image.

<!-- Illustration -->

We will learn more about tools in the Tool section.

## What can an Agent do ?

The Agent has a task to perform the LLM at his core should selectect the best course of **ACTIONS** to fullfill it.

Example : "If I ask my personal assistant (like Siri) on my computer to send an email to my Manager asking to delay today's meeting", I will need to give code some Tool (in this case a python function) do such a thing :

```python
Send_message_to(recipient, message):
"""Useful to send an e-mail message to someone"""
```

And the AI model will need to run that code somehow to fulfill the predefined task :
```python
Send_message_to("Manager","Can we postopone today's meeting ?")
```

In Agents, the design of the Tools is very important that greatly impact the quality of your agent. Some task will requiere some very specific tools to be crafted, and other may be solved with some general purpose tool like "web_search".

Here we do the distinction between an Action and a Tool, because in some Agent implementations, one Action can contain multiples tool use.

Having an AI interract with it's environment opens a lot of real life scenarios for companies and individuals.

### Examples
Personal Virtual Assistants:
Virtual assistants like Siri, Alexa, or Google Assistant function as agents when they interact with users and their digital environments. They take user queries, analyze context, retrieve information from databases, and provide responses or initiate actions (like setting reminders, sending messages, or controlling smart devices).

Customer Service Chatbots:
Many companies deploy chatbots as agents that interact with customers in natural language. These agents can answer questions, guide users through troubleshooting steps, or even complete transactions. Their predefined objectives might include improving user satisfaction, reducing wait times, or increasing sales conversion rates. By interacting directly with customers, learning from the dialogues, and adapting their responses over time, they demonstrate the core principles of an agent in action.

AI NPC ( Non Playable Character) in a video game:
AI agents powered by large language models (LLMs) can make NPCs more dynamic and unpredictable. Instead of following rigid behavior trees, they can respond contextually, adapt to player interactions, and generate more nuanced dialogue. This flexibility helps create more lifelike, engaging characters that evolve alongside the player’s actions.


To summarize, an Agent is a system that uses an AI Model (mostly LLM) as its core reasoning engine, to :

* **Understand natural language:** Interpret and respond to human instructions in a meaningful way.
* **Reason and plan:** Analyze information, make decisions, and devise strategies to solve problems.
* **Interact with its environment:** Gather information, take actions, and observe the results of those actions.

Now that we understood the big picture of what are Agents, we need to understand how works LLMs.
48 changes: 0 additions & 48 deletions units/en/unit1/2_explain_llms.md

This file was deleted.

Loading
Loading