Utilizing Hermes Agent with Serverless Inference for AI Deployments

Most AI agents that are self-hosted face a common challenge: infrastructure becomes entangled with the model layer. Different tasks require different models—coding, summarization, vision—and managing multiple API keys, routing logic, and provider idiosyncrasies can complicate what was meant to be a straightforward agent.

The Hermes Agent exemplifies this complexity by integrating coding, reasoning, memory, delegation, tool use, and shell access in a single loop. Instead of managing multiple providers manually, the use of a single OpenAI-compatible endpoint via DigitalOcean Inference simplifies the process. This platform autonomously handles model selection and routing, making the setup more efficient.

In this guide, we'll set up the Hermes Agent with DigitalOcean Serverless Inference and learn how to use the Inference Router to optimize model handling for various tasks without needing custom routing infrastructure.

Key Takeaways

Unified Access: With one API key and a static URL https://inference.do-ai.run/v1, Hermes Agent gains access to over 70 models, eliminating the need for multiple vendor-specific keys or code changes.
Automatic Routing: The Inference Router handles routing logic automatically, allowing you to set a task pool in the Control Panel and let it optimize for cost, speed, or custom ranking.
Cost Efficiency: Auxiliary task overrides allow significant cost savings. Models for vision, compression, session search, and web extraction can be pinned to cheaper models, reducing expenses without compromising quality.

Illustration for: - Unified Access: With one API...

Hermes and DigitalOcean: A Strategic Combination

Hermes is designed to be model-agnostic, supporting a wide array of providers and any custom OpenAI-compatible endpoint such as DigitalOcean's. This flexibility means that no code changes are necessary to switch models or providers. Hermes manages tool calling, skills, memory, and sub-agent delegation and works with over 18 messaging platforms.

DigitalOcean's Inference Engine consolidates four services under one endpoint:

Serverless Inference: Offers pay-per-token access to over 70 open-source and cutting-edge models with scale-to-zero pricing.
Inference Router: A middleware tool that classifies prompts and routes them to the optimal model, optimizing for cost or latency.
Dedicated Inference: Provides reserved GPU endpoints for high-throughput workloads.
Batch Inference: Allows asynchronous jobs at reduced pricing.

For Hermes, the Serverless Inference provides a direct pathway to specific models, while the Inference Router facilitates intelligent model selection per request without custom routing logic.

Setup Requirements

To get started, you will need:

A DigitalOcean account with access to the Inference product.
A model access key in sk-do-... format.
A functioning Hermes Agent installation on Linux, macOS, WSL2, or Android (Termux).

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Ensure a model with a context capacity of at least 64K is used, as Hermes requires this for its operations.

Quick Setup: Serverless Inference as Hermes’ Main Provider

This setup involves pointing Hermes’ custom endpoint to DigitalOcean and selecting a single model.

Export the Key

echo 'export MODEL_ACCESS_KEY="sk-do-..."' >> ~/.zshrc
source ~/.zshrc

Find a Model ID

Use the following command to list available models:

curl -s -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  https://inference.do-ai.run/v1/models \
  | jq '.data[].id'

Choose a suitable model with at least 64K context capacity.

Configure the Hermes Provider

hermes model

In the interactive setup:

Select Custom endpoint.
API base URL: https://inference.do-ai.run/v1
API key: Insert your sk-do-… key.
Model name: Choose the model slug, e.g., llama3.3-70b-instruct.

Advanced: Using the Inference Router

Instead of fixing Hermes to a single model, the Inference Router dynamically assigns the best model from a pool for each request.

Creating an Inference Router

In the Control Panel, select Inference → Inference Router and click Create Router. You can either choose from default routers or create a custom one, which is advisable for complex workloads like Hermes.

When defining a custom router:

Name: Choose a stable reference name.
Description: This acts as the routing prompt.
Tasks: Define tasks with clear names and descriptions, assign a model pool, and set a selection policy (cost, speed, or manual ranking).

Directing Hermes to the Router

Update the model field to point to the router:

hermes model
# Custom endpoint
# API base URL:   https://inference.do-ai.run/v1   (same as before)
# API key:        sk-do-...                         (same as before)
# Model name:     router:my-hermes-router           ← the only change

Conclusion

Integrating Hermes Agent with serverless inference solutions simplifies the deployment of AI agents by minimizing the intricacies of model management. As you progress from basic setups to using the Inference Router, you can achieve more efficient and cost-effective AI operations.