How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware
Overview
Agentic AI is transforming productivity by enabling autonomous task execution. Following the success of frameworks like OpenClaw, the open-source community has embraced Hermes Agent—a new framework that has garnered over 140,000 GitHub stars in under three months and, as of last week, is the most used agent on OpenRouter. Developed by Nous Research, Hermes is designed for reliability and self-improvement, two qualities historically challenging to achieve. It is provider- and model-agnostic, optimized for always-on local use, making NVIDIA RTX PCs, NVIDIA RTX PRO workstations, and NVIDIA DGX Spark the ideal hardware to run it at full speed, 24/7.

This guide will walk you through setting up Hermes Agent locally using NVIDIA hardware and the Qwen 3.6 series models from Alibaba, which are high-performance, open-weight LLMs that outperform previous-generation larger models. By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Prerequisites
Before you begin, ensure you have the following:
- Hardware: An NVIDIA RTX PC, RTX PRO workstation, or DGX Spark. For Qwen 3.6 27B or 35B models, at least 20GB of free GPU memory (for 35B) or less for 27B.
- Software: Windows or Linux with NVIDIA drivers (version 545 or later recommended).
- Tools: Docker and NVIDIA Container Toolkit installed for GPU acceleration.
- Models: Access to Qwen 3.6 models (27B or 35B) via Hugging Face or NVIDIA NGC. You'll need a Hugging Face token for download.
- Knowledge: Basic familiarity with command line, Docker, and Python. No deep AI expertise required.
Step-by-Step Instructions
Step 1: Set Up Your NVIDIA Environment
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
First, verify your GPU is recognized. Open a terminal and run:
nvidia-smi
You should see your GPU model, driver version, and available memory. Next, install the NVIDIA Container Toolkit to enable GPU passthrough to Docker:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
For Windows, ensure you have WSL2 and Docker Desktop with WSL2 integration enabled.
Step 2: Download Qwen 3.6 Model
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Choose either the 27B or 35B parameter model. The 35B model runs on ~20GB of memory and outperforms 120B models (which require 70GB+). The 27B is a dense model that matches accuracy of 400B models. Use huggingface-cli:
pip install huggingface_hub
huggingface-cli download Qwen/Qwen3.6-35B-Instruct --local-dir ./qwen35b
Replace with the correct repository name if needed. Ensure you have a Hugging Face token set (huggingface-cli login).
Step 3: Launch Hermes Agent with Docker
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Pull the Hermes Agent Docker image optimized for NVIDIA GPUs:
docker pull nousresearch/hermes-agent:latest-cuda
Run the container with GPU access and mount the model directory:
docker run --gpus all -d --name hermes-agent \
-v $(pwd)/qwen35b:/models \
-e MODEL_PATH=/models \
-p 8080:8080 \
nousresearch/hermes-agent:latest-cuda
This launches a web interface at http://localhost:8080 and a REST API for integration.
Step 4: Configure Self-Evolving Skills
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Hermes automatically saves learnings from complex tasks as skills. To enable this, edit the configuration file (hermes_config.yaml inside the container or mount it):

skills:
auto_learn: true
max_skills: 50
memory_dir: /data/skills
Restart the container to apply changes. Skills are stored as JSON files that can be reviewed and manually curated.
Step 5: Integrate with Messaging and Files
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Hermes supports Slack, Discord, and file access. For Slack integration, set environment variables:
docker run --gpus all -d --name hermes-agent \
-e SLACK_BOT_TOKEN=xoxb-... \
-e SLACK_APP_TOKEN=xapp-... \
...
For local file access, mount directories:
-v /path/to/files:/data/files
Now your agent can read, write, and process files.
Common Mistakes
Insufficient GPU Memory
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
The Qwen 3.6 35B model requires ~20GB of GPU memory. If you have less, use the 27B model or enable 4-bit quantization. Check with nvidia-smi during startup; if the container crashes with OOM, reduce model size.
Missing NVIDIA Container Toolkit
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Without the toolkit, Docker cannot access the GPU, leading to very slow inference on CPU. Verify with docker run --gpus all nvidia/cuda:11.0-base nvidia-smi.
Skill Overload
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
If auto-learning creates too many skills (max_skills too high), the agent may become slower. Set a reasonable limit and periodically review skills via the web UI. Remove duplicates or outdated ones.
Firewall Blocking Ports
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
If you cannot access the web interface, ensure port 8080 is open in your firewall. On Linux, use sudo ufw allow 8080.
Summary
By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.
Hermes Agent combined with Qwen 3.6 on NVIDIA RTX hardware delivers a powerful, self-improving AI that runs entirely locally. Key takeaways:
- Self-evolving skills: Agent learns from tasks and saves reusable skills.
- Contained sub-agents: Efficient task management with small context windows.
- Reliability by design: Pre-tested skills minimize debugging.
- Hardware matters: NVIDIA GPUs provide the performance needed for 24/7 operation.
By following this guide, you have set up a local agent that not only performs tasks but improves over time, making it ideal for power users who demand privacy, speed, and adaptability.
Related Articles
- Valkey-Swift 1.0 Launches: Production-Grade Swift Client for Valkey and Redis
- Python 3.13.9: Targeted Bug Fix Release Explained
- NVIDIA-VAAPI-Driver 0.0.17 Enhances Hardware Decoding on GB10 Systems
- Transforming Git Documentation: A Q&A on Data Models and Community Feedback
- How GitHub Uses Continuous AI to Turn Accessibility Feedback into Action
- PHP Project Moves to BSD License: A New Era for Open Source Licensing
- A Maintainer's Guide to Thriving in the Age of AI-Driven Open Source
- Rust Foundation Secures 13 Google Summer of Code 2026 Slots Amidst Surge in Applications