Building Agents without Harness-Engineering

Do not build your own agent. Host Hermes and give it tools, skills, and a system prompt. We're launching an API that makes this process easy.

For prismvideos.com, we shipped a media generation agent built on Vercel AI Agents SDK. Our agent understood which model to recommend to users, could generate images and videos, and could analyze videos and tell users how to recreate them. It was beautiful.

To my horror, days later, Higgsfield, a competitor of ours and a leader in the AI media generation space, launched an agent called Supercomputer. Supercomputer has observational memory (memory across sessions), skills, automations, a computer, and a filesystem. It would have taken us weeks to add all of these features. Supercomputer wasn't built with Vercel AI SDK, Claude Agents SDK, or OpenAI Agents SDK; it is built on Hermes, the open-source personal agent with 185k+ GitHub stars (at this time of writing).

I thought Hermes was a fad for nerds (like myself). But I realized if we used Hermes as a primitive for our agent, we could get session management (per-session memory and compaction), built-in tools (web search, browser, file system navigation), skills, self-learning, and automations for free. Customers could ask our agent, "every week look at our top-performing influencer video from last week and make five variations" - a true magic moment.

We deleted our existing agent, and we launched an EC2 instance with a Hono server. The server created a Hermes agent in a Docker container for every customer. It also acted as a reverse proxy for passing messages between our app and the Hermes gateway. Now, we communicate with every user's Hermes agent over a WebSocket connection.

Rather than building observational memory, skills, self-learning, automations, and a persistent filesystem, we only needed to focus on the engineering relevant to prismvideos.com. We can give the agent our system prompt, our tools for creating media and determining which models to use via MCP, our skills files (how to create UGC videos, storyboards, visual effects), and our connectors (Meta Ads Manager, Google Drive, Resend).

As consumer-facing agents get better - Claude, ChatGPT, Manus - customer expectations rise (for B2B software too). The Claude app has memory, so now my CEO wants it. What about self-learning? Steering? Can we add the Ralph Wiggum loop?

Companies are pouring billions into research and development on agent harnesses. I have no doubt that there will be a new agent harness after Hermes with a new feature everyone wants (it appears the new thing right now is Hermes' built-in learning loop). It is highly unlikely that an AI agent startup becomes wealthy by creating the best harness for a particular use case. If anything, they only expose themselves to the risk that a competitor ships a more feature-complete agent when the next harness arrives. AI agent startups are most likely to create differentiated value by integrating with their customers' proprietary data and learning their preferences.

The agent is the new primitive. Existing agent frameworks require developers to set up:

session management (in some cases)
tools (in some cases)
memory
self-learning
automations
persistent filesystem
container or sandboxed deployment
skills
MCP servers

But one through seven are part of any agent application.

By programmatically creating Hermes instances, developers get the agent and the infrastructure in a single API call:

POST /v1/deployments
Authorization: Bearer $PRISM_API_KEY
Content-Type: application/json

{
  "customer_id": "cus_123",
  "name": "Acme Creative Agent",
  "runtime": "hermes",
  "model": "anthropic/claude-sonnet-4.5",
  "system_prompt": "You are Acme's media generation agent. Help the user plan, create, and iterate on high-performing short-form videos.",
  "sandbox": {
    "enabled": true,
    "type": "docker",
    "persistent_filesystem": true
  },
  "mcp_servers": [
    {
      "name": "prism-media",
      "url": "https://api.prismvideos.com/mcp",
      "tools": [
        "search_models",
        "get_model_schema",
        "get_pricing",
        "generate_image",
        "generate_video",
        "generate_audio"
      ]
    }
  ],
  "skills": [
    {
      "name": "ugc-video-creation",
      "source": "file",
      "path": ".prism/skills/ugc-video-creation/SKILL.md"
    },
    {
      "name": "storyboarding",
      "source": "inline",
      "content": "---\nname: storyboarding\ndescription: Create shot-by-shot storyboards for short-form videos\n---\n# Storyboarding\n..."
    },
    {
      "name": "social-media-visual-effects",
      "source": "url",
      "url": "https://example.com/skills/social-media-visual-effects/SKILL.md"
    }
  ],
  "secrets": {
    "META_ADS_TOKEN": "sec_meta_ads_token",
    "GOOGLE_DRIVE_TOKEN": "sec_google_drive_token"
  },
  "features": {
    "memory": true,
    "dreaming": true,
    "automations": true,
    "steering": true,
    "filesystem_webhooks": true
  }
}

Response:

{
  "deployment_id": "dep_7xK9s2",
  "customer_id": "cus_123",
  "runtime": "hermes",
  "status": "ready",
  "model": "anthropic/claude-sonnet-4.5",
  "thread_id": "thr_default_8a1",
  "filesystem": {
    "workspace_path": "/workspace",
    "persistent": true
  },
  "events": {
    "transport": "sse",
    "url": "https://api.prismagents.com/v1/deployments/dep_123/events"
  }
}

Bring a system prompt, skills, tools, and connectors and get an endpoint to chat with an agent over SSE.

There are a number of schleps creating an agent people use requires. Harness-engineering should not be one of them. This same insight that led us to create our api likely also prompted LangChain to launch Managed Deep Agents and Claude to launch Managed Agents. LangChain Managed Deep Agents is a hosted runtime for deploying AI agents. Developers bring their system prompt, MCP tools, skills, and subagent definitions and get an agent they can chat with. Likewise, Claude Managed Agents gives developers the agent and the infrastructure in a single API call.

LangChain Managed Deep Agents is a powerful abstraction but doesn't expose automations, comes without built-in self-learning, and persistent goals (Ralph Wiggum loop).

Claude Managed Agents has self-learning in research preview, but likewise doesn't expose automations, persistent goals, or accept video inputs via API (a restriction of their models).

The following details cover the difference between our API and their offerings:

Capability	Managed Hermes Agents	LangChain Managed Deep Agents	Claude Managed Agents
No provider lock-in	✓	✓	✗
Session management	✓	✓	✓
Agent + infrastructure in one API call	✓	✓	✓
Observational memory	✓	✓	✓
Built-in tools: web search, browser, file search	✓	✓	✓
Persistent filesystem	✓	✓	✓
Image & video input	✓	✗	✗
Per-container isolation	✓	✓	✓
Credential management	✓	✓	✓
Automations	✓	✗	✗
Subagents	✓	✓	✓
Self-Learning	✓	✗	✓
Ralph Wiggum loop	✓	✗	✗
Steering	✓	✗	✓

Fin

If you're a developer with a customer-facing chat product, ping me rajit [at] prismvideos [dot] com. We are happy to build your agent for you :).

Thanks to Alex Liu, Land Tantichot, Mom, Dad, Vivek Hazari, Dan Gackle, Daniel DiPietro and Stepan Parunashvili for reading drafts of this post.