Merge branch 'huggingface:main' into main

huggingface · Feb 18, 2025 · fe95a10 · fe95a10
2 parents 563528d + 0bd61c7
commit fe95a10
Show file tree

Hide file tree

Showing 11 changed files with 6,346 additions and 11 deletions.
diff --git a/notebooks/bonus-unit1/gemma-SFT-thinking-function_call.ipynb b/notebooks/bonus-unit1/gemma-SFT-thinking-function_call.ipynb
diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml
@@ -44,7 +44,17 @@
     title: Get Your Certificate
   - local: unit1/conclusion
     title: Conclusion
-- title: When will the next units be published?
+- title: Bonus Unit 1. Fine-tuning an LLM for Function-calling
+  sections:
+  - local: bonus-unit1/introduction
+    title: Introduction
+  - local: bonus-unit1/what-is-function-calling
+    title: What is Function Calling?
+  - local: bonus-unit1/fine-tuning
+    title: Let's Fine-Tune your model for Function-calling
+  - local: bonus-unit1/conclusion
+    title: Conclusion
+- title: When the next steps are published?
   sections:
   - local: communication/next-units
     title: Next Units
diff --git a/units/en/bonus-unit1/conclusion.mdx b/units/en/bonus-unit1/conclusion.mdx
@@ -0,0 +1,13 @@
+# Conclusion [[conclusion]]
+
+Congratulations on finishing this first Bonus Unit 🥳
+
+You've just **mastered understanding function-calling and how to fine-tune your model to do function-calling**!
+
+If we have one piece of advice now, it’s to try to **fine-tune different models**. The **best way to learn is by trying.**
+
+In the next Unit, you're going to learn how to use **state-of-the-art frameworks such as `smolagents`, `LlamaIndex` and `LangGraph`**.
+
+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
+
+### Keep Learning, Stay Awesome 🤗
diff --git a/units/en/bonus-unit1/fine-tuning.mdx b/units/en/bonus-unit1/fine-tuning.mdx
@@ -0,0 +1,51 @@
+# Let's Fine-Tune your model for function-calling
+
+We're now ready to fine-tune our first model for function-calling 🔥.
+
+## How do we train our model for function-calling ?
+
+> Answer : We need **data**
+
+A model training can be divided into 3 steps :
+
+1. **The model is pretrained on a large quantity of data**. The output of that step is a **pre-trained model**. For instance [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b). It's a base model and only knows how **to predict the next token without good instruction following capacities**.
+
+2. The model then, to be useful in chat context needs to be **fine-tuned** to follow instructions. In this step, it can be trained by the model creators, open-source community, you, or everyone. For instance [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) is an instruct-tuned model by the Google Tea behind the Gemma project.
+
+3. The model can then be **aligned** to the creator's preference. For instance, a customer service chat model that must never be impolite to customers.
+
+Usually a complete product like Gemini or Mistral **will go through all 3 steps** while the models you can find on Hugging Face have passed by one or more steps of this training.
+
+In this tutorial, we will build a function-calling model based on[google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it). The base model is [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) and the Google team fine-tuned the base model on instruction following : resulting in **"google/gemma-2-2b-it"**. 
+
+In this case we will take **"google/gemma-2-2b-it"** as base and **not the base model because the prior fine-tuning it has been through is important for our use-case**.
+
+Since we want to interact with our model through conversations in messages, starting from the base model **would require more training in order to learn instruction following, chat AND function-calling**.
+
+By starting from the instruct-tuned model, **we minimize the amount of information that our model needs to learn**.
+
+## LoRA  (Low-Rank Adaptation of Large Language Models)
+
+LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly **reduces the number of trainable parameters**.
+
+It works by **inserting a smaller number of new weights as an adapter into the model to train**. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share. 
+
+<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/blog_multi-lora-serving_LoRA.gif" alt="LoRA inference" width="50%"/>
+
+LoRA works by adding pairs of rank decomposition matrices to Transformer layers, typically focusing on linear layers. During training, we will "freeze" the rest of the model and will only update the weights of those newly added adapters. 
+
+By doing so, the number of paramters that we need to train drops considerably as we only need to update the adapter's weights.
+
+During inference, the input is passed into the adapter and the base model or these adapter weights can be merged with the base model, resulting in no additional latency overhead. 
+
+LoRA is particularly useful for adapting **large** language models to specific tasks or domains while keeping resource requirements manageable. This helps reduce the memory requiered to train a model.
+
+If you want to learn more about how LoRA works, you should check this [tutorial](https://huggingface.co/learn/nlp-course/chapter11/4?fw=pt).
+
+## Fine-Tuning a model for Function-calling
+
+You can access the tutorial notebook 👉 [here](https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb).
+
+Then, click on Open In Colab to be able to run it in a Colab Notebook.
+
+
diff --git a/units/en/bonus-unit1/introduction.mdx b/units/en/bonus-unit1/introduction.mdx
@@ -0,0 +1,53 @@
+# Introduction
+
+![Bonus Unit 1 Thumbnail](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/thumbnail.jpg)
+
+Welcome to this first **Bonus Unit**, where you'll learn to **fine-tune a Large Language Model (LLM) for function calling**.
+
+In terms of LLMs, function calling is quickly becoming a *must-know* technique. 
+
+The idea is, rather than relying only on prompt-based approaches like we did in Unit 1, function calling trains your model to **take actions and interpret observations during the training phase**, making your AI more robust.
+
+> **When should I do this Bonus Unit?**
+>
+> This section is **optional** and is more advanced than Unit 1, so don't hesitate to either do this unit now or revisit it when your knowledge has improved thanks to this course. 
+>  
+> But don't worry, this Bonus Unit is designed to have all the information you need, so we'll walk you through every core concept of fine-tuning a model for function-calling even if you haven’t learned yet the inner workings of fine-tuning.
+
+The best way for you to be able to follow this Bonus Unit is:
+
+1. Know how to Fine-Tune an LLM with Transformers, if it's not the case [check this](https://huggingface.co/learn/nlp-course/chapter3/1?fw=pt)
+
+2. Know how to use `SFTTrainer`to fine-tune our model, to learn more about it [check this documentation](https://huggingface.co/learn/nlp-course/en/chapter11/1) 
+
+---
+
+## What You’ll Learn
+
+1. **Function Calling**  
+   How modern LLMs structure their conversations effectively letting them trigger **Tools**.
+
+2. **LoRA (Low-Rank Adaptation)**  
+   A **lightweight and efficient** fine-tuning method that cuts down on computational and storage overhead. LoRA makes training large models *faster, cheaper, and easier* to deploy.
+
+3. **The Thought → Act → Observe Cycle** in Function Calling models  
+   A simple but powerful approach for structuring how your model decides when (and how) to call functions, track intermediate steps, and interpret the results from external Tools or APIs.
+
+4. **New Special Tokens**  
+   We’ll introduce **special markers** that help the model distinguish between:
+   - Internal “chain-of-thought” reasoning  
+   - Outgoing function calls  
+   - Responses coming back from external tools
+
+---
+
+By the end of this bonus unit, you’ll be able to:
+
+- **Understand** the inner working of APIs when it comes to Tools.  
+- **Fine-tune** a model using LoRA techniques.  
+- **Implement** and **modify** the Thought → Act → Observe cycle to create robust and maintainable Function-calling workflows.  
+- **Design and utilize** special tokens to seamlessly separate the model’s internal reasoning from its external actions.
+
+And you'll **have fine-tuned your own model to do function calling.** 🔥
+
+Let’s dive into **function calling**!
diff --git a/units/en/bonus-unit1/what-is-function-calling.mdx b/units/en/bonus-unit1/what-is-function-calling.mdx
@@ -0,0 +1,77 @@
+# What is Function Calling?
+
+Function-calling is a **way for an LLM to take actions on its environment**. It has first been [introduced in GPT-4](https://openai.com/index/function-calling-and-other-api-updates/), and was then reproduced in other models.
+
+Just like the tools of an Agent, function-calling gives the model the capacity to **take an action on its environment**. However, the function calling capacity **is learned by the model**, and relies **less on prompting than other agents techniques**.
+
+During the Unit 1, the Agent **didn't learn to use the Tools**, we just provided the list, and we relied on the fact that the model **was able to generalize on defining a plan using these Tools**. 
+
+While here, **with function-calling, the Agent is fine-tuned (trained) to use Tools**.
+
+## How does the model "learn" to take an action?
+
+In Unit 1, we explored the general workflow of an agent. Once the the user has given some tools to the agent and prompted it with a query, the model will cycle through:
+
+1. *Think* : What action(s) do I need to take in order to fulfill the objective.
+2. *Act* : Format the action with the correct parameter and stop the generation.
+3. *Observe* : Get back the result from the execution.
+
+In a "typical" conversation with a model through an API, the conversation will alternate between user and assistant messages like this:
+
+```python
+conversation = [
+    {"role": "user", "content": "I need help with my order"},
+    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
+    {"role": "user", "content": "It's ORDER-123"},
+]
+```
+
+Function-calling is brings **new roles to the conversation**! 
+
+1. One new role for an **Action** 
+2. One new role for an **Observation** 
+
+If we take the [Mistral API](https://docs.mistral.ai/capabilities/function_calling/) as an example, it would look like this:
+
+```python
+conversation = [
+    {
+        "role": "user",
+        "content": "What's the status of my transaction T1001?"
+    },
+    {
+        "role": "assistant",
+        "content": "",
+        "function_call": {
+            "name": "retrieve_payment_status",
+            "arguments": "{\"transaction_id\": \"T1001\"}"
+        }
+    },
+    {
+        "role": "tool",
+        "name": "retrieve_payment_status",
+        "content": "{\"status\": \"Paid\"}"
+    },
+    {
+        "role": "assistant",
+        "content": "Your transaction T1001 has been successfully paid."
+    }
+]
+```
+
+> ... But you said there's a new role for function calls ?
+
+**Yes and no**, in this case and in a lot of other APIs, the model formats the action to take as an "assistant" message. The chat template will then represent this as **special tokens** for function-calling.
+
+- `[AVAILABLE_TOOLS]` – Start the list of available tools  
+- `[/AVAILABLE_TOOLS]` – End the list of available tools  
+- `[TOOL_CALLS]` – Make a call to a tool (i.e., take an "Action")  
+- `[TOOL_RESULTS]` – "Observe" the result of the action  
+- `[/TOOL_RESULTS]` – End of the observation (i.e., the model can decode again)
+
+We'll talk again about function-calling in this course, but if you want to dive deeper you can check [this excellent documentation section](https://docs.mistral.ai/capabilities/function_calling/)
+
+---
+Now that we learned what function-calling is and how it works, let's **add some function-calling capabilities to a model that do not have those capacities yet**: **"google/gemma-2-2b-it"** by appending some new special tokens to the model.
+
+To be able to do that, **we need first to understand fine-tuning and LoRA**. 
diff --git a/units/en/unit0/introduction.mdx b/units/en/unit0/introduction.mdx
@@ -38,7 +38,7 @@ At the end of this course you'll understand **how Agents work and how to build y
 
 Don't forget to **<a href="https://bit.ly/hf-learn-agents">sign up to the course!</a>** 
 
-(We are respectful of your privacy. We collect your email address to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**
+(We are respectful of your privacy. We collect your email address to be able to **send you the links when each Unit is published and give you information about the challenges and updates**).
 
 ## What does the course look like? [[course-look-like]]
 
@@ -61,6 +61,7 @@ Here is the **general syllabus for the course**. A more detailed list of topics
 | :---- | :---- | :---- |
 | 0 | Onboarding | Set you up with the tools and platforms that you will use. |
 | 1 | Agent Fundamentals | Explain Tools, Thoughts, Actions, Observations, and their formats. Explain LLMs, messages, special tokens and chat templates. Show a simple use case using python functions as tools. |
+| 1.5 | Bonus : Fine-tuning an LLM for function calling | Let's use LoRa and fine-tune a model to perform function calling inside a notebook. |
 | 2 | Frameworks | Understand how the fundamentals are implemented in popular libraries : smolagents, LangGraph, LLamaIndex |
 | 3 | Use Cases | Let's build some real life use cases (open to PRs 🤗 from experienced Agent builders) |
 | 4 | Final Assignment | Build an agent for a selected benchmark and prove your understanding of Agents on the student leaderboard  🚀 |
@@ -113,7 +114,7 @@ Since there's a deadline, we provide you a recommended pace:
 To get the most out of the course, we have some advice:
 
 1. <a href="https://discord.gg/UrrTSsSyjb">Join study groups in Discord</a>: studying in groups is always easier. To do that, you need to join our discord server and verify your Hugging Face account.
-2. **Do the quizzes and assignments**: the best way to learn is through hands-on practice and self-assessment..
+2. **Do the quizzes and assignments**: the best way to learn is through hands-on practice and self-assessment.
 3. **Define a schedule to stay in sync**: you can use our recommended pace schedule below or create yours.
 
 <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/advice.jpg" alt="Course advice" width="100%"/>

diff --git a/units/en/unit1/dummy-agent-library.mdx b/units/en/unit1/dummy-agent-library.mdx
@@ -28,7 +28,7 @@ In the Hugging Face ecosystem, there is a convenient feature called Serverless A
 import os
 from huggingface_hub import InferenceClient
 
-## You need a token from https://hf.co/settings/tokens. If you run this on Google Colab, you can set it up in the "settings" tab under "secrets". Make sure to call it "HF_TOKEN"
+## You need a token from https://hf.co/settings/tokens, ensure that you select 'read' as the token type. If you run this on Google Colab, you can set it up in the "settings" tab under "secrets". Make sure to call it "HF_TOKEN"
 os.environ["HF_TOKEN"]="hf_xxxxxxxxxxxxxx"
 
 client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")

diff --git a/units/en/unit1/messages-and-special-tokens.mdx b/units/en/unit1/messages-and-special-tokens.mdx
@@ -6,7 +6,7 @@ Just like with ChatGPT, users typically interact with Agents through a chat inte
 
 > **Q**: But ... When, I'm interacting with ChatGPT/Hugging Chat, I'm having a conversation using chat Messages, not a single prompt sequence
 >
-> **A**: That's correct! But this is in fact an UI abstraction. Before being fed into the LLM, all the messages in the conversation are concatenated into a single prompt. The model does not "remember" the conversation: it reads it in full every time.
+> **A**: That's correct! But this is in fact a UI abstraction. Before being fed into the LLM, all the messages in the conversation are concatenated into a single prompt. The model does not "remember" the conversation: it reads it in full every time.
 
 Up until now, we've discussed prompts as the sequence of tokens fed into the model. But when you chat with systems like ChatGPT or HuggingChat, **you're actually exchanging messages**. Behind the scenes, these messages are **concatenated and formatted into a prompt that the model can understand**.
 

diff --git a/units/en/unit1/tools.mdx b/units/en/unit1/tools.mdx
@@ -82,7 +82,7 @@ The output of the tool is another integer number that we can describe like this:
 
 All of these details are important. Let's put them together in a text string that describes our tool for the LLM to understand.
 
-```
+```text
 Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int
 ```
 
@@ -124,7 +124,7 @@ Note the `@tool` decorator before the function definition.
 
 With the implementation we'll see next, we will be able to retrieve the following text automatically from the source code via the `to_string()` function provided by the decorator:
 
-```
+```text
 Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int
 ```
 
@@ -272,7 +272,7 @@ print(calculator.to_string())
 
 And we can use the `Tool`'s `to_string` method to automatically retrieve a text suitable to be used as a tool description for an LLM:
 
-```
+```text
 Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int
 ```
 

diff --git a/units/en/unit1/what-are-llms.mdx b/units/en/unit1/what-are-llms.mdx
@@ -39,7 +39,7 @@ There are 3 types of transformers :
 3. **Seq2Seq (Encoder–Decoder)**  
    A sequence-to-sequence Transformer _combines_ an encoder and a decoder. The encoder first processes the input sequence into a context representation, then the decoder generates an output sequence.
 
-   - **Example**: T5, BART, 
+   - **Example**: T5, BART 
    - **Use Cases**:  Translation, Summarization, Paraphrasing
    - **Typical Size**: Millions of parameters
 
@@ -216,6 +216,8 @@ If you'd like to dive even deeper into the fascinating world of language models
 
 Now that we understand how LLMs work, it's time to see **how LLMs structure their generations in a conversational context**.
 
-To run this notebook, **you need a Hugging Face token** that you can get from <a href="https://hf.co/settings/tokens" target="_blank">https://hf.co/settings/tokens</a>.
+To run <a href="https://huggingface.co/agents-course/notebooks/blob/main/dummy_agent_library.ipynb" target="_blank">this notebook</a>, **you need a Hugging Face token** that you can get from <a href="https://hf.co/settings/tokens" target="_blank">https://hf.co/settings/tokens</a>.
 
-You also need to request access to <a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" target="_blank">the Meta Llama models</a>
+For more information on how to run Jupyter Notebooks, checkout <a href="https://huggingface.co/docs/hub/notebooks">Jupyter Notebooks on the Hugging Face Hub</a>.
+
+You also need to request access to <a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" target="_blank">the Meta Llama models</a>.