Lately, I’ve been playing with smolagents, a very simple and very cool library for building “AI” agents.
Why smolagents?
There are like, a lot of agent libraries around, and it feels like a new one is popping up every two weeks or so, so why this one?
Well, one thing I like about smolagents is its approach to generating plans – it happily uses Python code for this 🙃. It will just go ahead and write a Python script, then it’ll run it on either your machine (more on this later), or on E2B. No words on Azure Container Apps dynamic sessions yet but one can still hope.
Expressing an agent’s plan via script means that strong coding models will have an easier way to express plans (no need to coax them to generate xml/json/whatever, just use code).
But. This also means you will need to run LLM-generated code on your own machine, with your own permissions, without being able to approve/reject anything. The library does filter the allowed imports to just a few (modules like time
, random
, but no os
for example), so it should be safe in theory. Keep this in mind when choosing the models powering your agents.
Using smolagents with Azure OpenAI
Anyway, back to our premise. First thing I wanted to try was hooking up smolagents with Azure OpenAI – the current default, recommended way is to use LiteLLM as a wrapper on top of Azure. Which is reasonable and all – it doesn’t make sense to write and maintain 100 model implementations when all you want to do is build an agents library.
But, since I’m not a big fan of abstracting away simple things, and since smolagents v1.2 added native support for connecting to OpenAI instances (but not Azure ones) I figured why don’t I just subclass this?
I came up with the class below. Notice I’m trying to maintain the built-in behavior and initialization as much as possible, only thing I’m doing is overriding the base class’ OpenAI client with an Azure OpenAI-specific one.
from typing import Optional, Dict
from smolagents.models import OpenAIServerModel
class AzureOpenAIServerModel(OpenAIServerModel):
"""This model connects to an Azure OpenAI deployment.
Parameters:
model_id (`str`):
The model identifier to use on the server (e.g. "gpt-3.5-turbo").
azure_endpoint (`str`, *optional*):
The Azure endpoint, including the resource, e.g. `https://example-resource.azure.openai.com/`
api_key (`str`, *optional*):
The API key to use for authentication.
custom_role_conversions (`Dict{str, str]`, *optional*):
Custom role conversion mapping to convert message roles in others.
Useful for specific models that do not support specific message roles like "system".
**kwargs:
Additional keyword arguments to pass to the Azure OpenAI API.
"""
def __init__(
self,
model_id: str,
azure_endpoint: Optional[str] = None,
api_key: Optional[str] = None,
api_version: Optional[str] = None,
custom_role_conversions: Optional[Dict[str, str]] = None,
**kwargs,
):
super().__init__(model_id=model_id, api_key=api_key, custom_role_conversions=custom_role_conversions, **kwargs)
# if we've reached this point, it means the openai package is available (baseclass check) so go ahead and import it
import openai
self.client = openai.AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=azure_endpoint
)
You can instantiate it easily:
model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL_LITE"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT")
)
Cool 😊. Now we can use Azure OpenAI to power agents big and small.
Building a simple agent
First, make sure to install the library:
pip install smolagents[openai]
Then all you need to do is:
from smolagents import CodeAgent, DuckDuckGoSearchTool, VisitWebpageTool
agent = CodeAgent(tools=[DuckDuckGoSearchTool(), VisitWebpageTool()], model=model)
agent.run("How many ducks would it take to completely fill up the tower of Pisa?")
In my case, the agent started by going through such steps as
- querying for “Leaning Tower of Pisa volume”
- visiting https://en.wikipedia.org/wiki/Leaning_Tower_of_Pisa (lots of tokens to process here, it could benefit from something like grabit)
- searching for “Leaning Tower of Pisa dimensions”
- searching for “average volume of a duck in liters”
It then came up with this bit of code
import math
# Height of the Leaning Tower of Pisa in meters
tower_height = 56.67
# Diameter of the base of the Leaning Tower of Pisa in meters
base_diameter = 15
# Radius of the base
radius = base_diameter / 2
# Volume of the tower (cylinder)
tower_volume = math.pi * (radius ** 2) * tower_height
# Average volume of a duck in cubic meters (assuming 1 duck = 0.000105 m^3)
duck_volume = 0.000105
# Number of ducks that can fit in the tower
num_ducks = tower_volume / duck_volume
print(f"Volume of the Leaning Tower of Pisa: {tower_volume} m^3")
print(f"Number of ducks that can fit: {num_ducks}")
It’s…okay I guess? Is the tower of Pisa a perfect cylinder? Are those its actual measurements? What is the volume of a duck?
I honestly have no idea, I never check AI’s work 🚀🌝.
Conclusions
The answer is (presumably, hopefully, ideally) 95,375,386.97. Ducks. Ninety five million ducks. To fill in the leaning tower of Pisa. The more you know…
All it took to compute this was 39k input tokens and 800 output tokens. This is something to keep in mind when building agents in general – the costs and speed are likely to be worse than when writing a dedicated component, LLM-powered or not. But when you need something flexible and don’t mind the costs, they’re awesome.