# 3 rules for getting AI agents to find, use—and not exploit—your devtool

> The agent-led growth playbook: how to make AI agents discover, use, and pay for your developer tool, and defend against the ones you didn't invite. LLM discoverability, agent-first onboarding, agent payments, AX security.

- Date: 2026-04-21T00:00:00.000Z
- Authors: Irina Nazarova, Travis Turner
- Categories: AI, Developer Community
- URL: https://evilmartians.com/chronicles/3-rules-for-getting-ai-agents-to-find-use-and-not-exploit-your-devtool

---

In early 2025, **Matt Biilmann**, CEO of Netlify, [coined the term](https://biilmann.blog/articles/introducing-ax/) "agent experience" or **AX**: how AI agents experience your product as a user. AX extends developer experience and drives **agent-led growth**: adoption that happens because agents recommend, set up, and integrate your tool without a human in the loop. Three things to get right: make agents **discover** your tool. Make agents **use** your tool. **Defend** against the ones you didn't invite.

Evil Martians help devtools startups build for agent-led growth, from LLM-ready docs and MCP servers to agent auth. We built bolt.new's Rails backend ([zero to $40M ARR in five months](/clients/stackblitz)) and maintain [PostCSS](https://github.com/postcss/postcss), [AnyCable](https://anycable.io), [imgproxy](https://imgproxy.net), and 120+ other open source tools with 177K+ GitHub stars.

**In this post:**

1. [Help agents discover and choose your tool](#help-agents-discover-and-choose-your-tool)
2. [Help agents use your tool autonomously](#help-agents-use-your-tool-autonomously)
3. [Protect yourself from bad agents](#protect-yourself-from-bad-agents)

## Help agents discover and choose your tool

Two years ago, a developer would Google "best WebSocket library for Node.js" and click through five posts. Today they just ask Claude or ChatGPT. One recommendation, one install, done.

Essentially, if your tool isn't in the model's answer, you don't exist. And this works nothing like SEO because the question isn't "how do I rank on page one?" but "_how do I get into the model's knowledge?_"

### Two knowledge systems, not one

LLM recommendations come from two systems and each requires a different strategy.

**Baked-in knowledge** comes from training data. Major datasets like [The Pile](https://pile.eleuther.ai/), [RedPajama](https://www.together.ai/blog/redpajama-data-v2), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), C4—are built from Common Crawl, GitHub, Stack Overflow, Wikipedia, and arXiv. A strong presence in these sources means you're wired into the model's defaults. This takes years, and the window for some sources (Stack Overflow's peak era) is already closed.

**Live retrieval** via RAG (retrieval-augmented generation) is what search-augmented models like Perplexity, ChatGPT with browsing, and Gemini fetch in real time. This is increasingly the default mode. Here's what most "LLM SEO" advice has gotten incorrect: the sources that dominate live retrieval are not the same ones that dominate training data.

Baked-in knowledge determines what agents *default to*. Live retrieval determines what agents *discover today*. You need both ...and optimizing for one doesn't help the other.

### What agents actually retrieve: a test

We ran 50+ developer queries across Claude, ChatGPT, and Gemini—all with search enabled—and recorded every source URL retrieved. We used our own domain, real-time infrastructure, as a test case.

We used queries like "best WebSocket solution for production Node.js," "how to stream LLM responses to multiple clients," "infrastructure for AI agents that communicate in real-time." These patterns apply to any developer tool category.

**Each LLM retrieves from fundamentally different sources.**

| Source type | Claude | ChatGPT | Gemini |
|---|---|---|---|
| Vendor comparison pages | **#1** (43%) | 41% | **#1** (60%) |
| Reddit threads | Not retrieved | **#1** (47%) | 15% |
| GitHub repos and issues | 7% | 41% | 50% |
| Official framework docs | 14% | **#1** (47%) | 25% |
| Developer platforms (Dev.to, Medium) | 36% | 29% | 20% |
| Stack Overflow | 0% | 24% | 15% |
| Academic papers | 0% | 0% | 10% |

Claude leans on SEO-optimized vendor content. ChatGPT retrieves broadly: Reddit, GitHub issues, official docs, Stack Overflow. Gemini has the strongest affinity for structured vendor content and uniquely surfaces academic benchmarks.

### Discovery is not enough: how LLMs choose the winner

Being found is just step one. But getting recommended is a four-stage funnel:

**Discovery** → **Candidacy** → **Framing** → **Selection**

We watched tools get discovered and still lose the recommendation because stages two through four are where the actual decision happens.

**Candidacy: labels lock you in.** In our tests, a tool that supports Python didn't appear for "Python" queries because every comparison page about it leads with a different framework in the title. The LLM sees the ecosystem label and filters the tool out before evaluation starts. If your comparison pages always pair your tool with one ecosystem, LLMs will never recommend you for another.

**Framing: own your comparisons or someone else will set the criteria.** When an LLM retrieves a comparison page, it adopts that page's evaluation framework—dimensions, criteria, trade-offs. In our tests, one competitor owned every head-to-head comparison page in the category. Their pages evaluate on six dimensions. If your strengths are on dimensions they didn't include, you lose even if you're better on what actually matters. Search "[your tool] vs [competitor]" right now. If the top result is a page you didn't write, you've handed the decision criteria to someone else.

**Selection: specificity wins.** Concrete claims beat vague ones. "P95 latency of 62ms" gets recommended. "Fast and reliable" gets listed but not chosen. "$99/month flat" beats "affordable pricing." Tools with specific benchmarks, customer names, and pricing were recommended as the primary choice. Generic positioning got listed as alternatives.

We've also found content gaps that prevent selection entirely. A tool in our test had nine customers in a regulated industry and supported self-hosted deployment—exactly what compliance-driven teams need. But no retrievable content connects those keywords to that tool. The capability exists; the content doesn't.

### The discoverability playbook

Ranked by cross-LLM impact:

**Build comparison and topic landing pages.** Structured `/compare/your-tool-vs-competitor` and `/topic/problem-domain` pages dominated retrieval across all three LLMs. One company in our tests appeared in 48% of all queries by building these systematically for every adjacent query. These pages rank in traditional search, which is what RAG pipelines fetch from—and as the funnel above shows, owning them means owning how LLMs evaluate you.

**Get on awesome-lists.** The `awesome-*` repos are among the most-starred, most-scraped resources on GitHub. They appear in virtually every training dataset. If your category has one and you're not on it, submit a PR today.

**Be present on Reddit.** Reddit was ChatGPT's #1 retrieval source—appearing in 47% of queries. Of course, this isn't about marketing posts, but giving genuine answers in subreddit threads where developers are asking for recommendations.

**Optimize your GitHub README.** Lead with a clear problem statement ("A drop-in replacement for X that does Y 10x faster"), an install command, a code example, and a comparison table. GitHub repos and issues were retrieved in 41-50% of ChatGPT and Gemini queries.

**Ship `llms.txt` and markdown docs—with eyes open.** [CDN log analyses](https://www.longato.ch/llms-recommendation-2025-august/) consistently show that LLM crawlers don't request `/llms.txt` unprompted. But the value isn't in crawlers—it's in what happens when a human pastes your URL into ChatGPT or when an agent follows links through your docs. [Stripe](https://docs.stripe.com/building-with-llms) added `/llms.txt` with an [instructions section](https://www.apideck.com/blog/stripe-llms-txt-instructions-section) that steers agents away from deprecated APIs—a corrective mechanism for model drift, not a sitemap for bots. 

Every Stripe doc page is available as `.md` by appending the extension. Markdown has fewer formatting tokens, renders hidden content, and preserves hierarchy that LLMs parse well. Implementation cost is near zero. 

We tested six techniques on our own site and wrote up with what worked, what didn't, and the metrics to tell the difference.

**Get customers to write engineering blog posts.** Gemini retrieves from company engineering blogs for "how do companies like X solve Y" queries: [Figma's blog](https://www.figma.com/blog/), [Linear's blog](https://linear.app/blog), [Notion's blog](https://www.notion.com/blog) for architecture questions. A customer's "How we use [your tool] in production" post feeds this path directly.

**Publish on Dev.to and Medium.** These platforms appeared in 20-36% of queries across all LLMs. Their high domain authority means new content surfaces in RAG pipelines within days.

**Don't ignore Wikipedia.** ChatGPT referenced Wikipedia pages for tools that have one. Disproportionate weight in training data, and a trust signal in live retrieval. If your tool meets notability criteria, create a page.

**Build for RAG.** Clear headings, defined terms, explicit claims ("supports 100K concurrent connections"), self-contained sections. Define acronyms—LLMs parse defined terms better.

### How to audit your LLM visibility

**Prompt testing across multiple LLMs.** Ask Claude, ChatGPT, Perplexity, and Gemini the same 15-20 queries a developer in your space would ask. Ask for sources, not answers by appending "List every source URL you consulted." 

A well-positioned tool can still be invisible for 80%+ of relevant queries. If you only appear when someone names you directly, you have a _reach_ problem, not a positioning problem.

**Common Crawl CDX queries.** Check if your docs are in [Common Crawl](https://commoncrawl.org/) via the [CDX API](https://index.commoncrawl.org/) or [cdx-toolkit](https://pypi.org/project/cdx-toolkit/). Not crawled = not trained on.

**GEO tools.** [Generative Engine Optimization](https://arxiv.org/html/2311.09735v3) (GEO) now has dedicated tools. [GitBook's GEO guide](https://gitbook.com/docs/guides/seo-and-llm-optimization/geo-guide) is a good starting point.

## Help agents use your tool autonomously

An agent that recommends your tool but can't set it up without a human hasn't converted anyone. The goal: "I recommend X" → "I've already set up X for you" in one step.

Biilmann frames this as four pillars: 
- **Access** (can the agent get in?)
- **Context** (does it understand your product?)
- **Tools** (can it operate your product?)
- **Orchestration** (can it trigger workflows?)

### The agent-first onboarding pattern

Agent-friendly products invert the signup funnel:
- **Traditional**: sign up → verify email → create project → configure → use.
- **Agent-first**: use → create → claim.

**Let agents work first, humans claim later.** Every friction point that assumed a human user—gone.

[Netlify](https://www.netlify.com/agent-experience/) pioneered "deploy anonymously, then claim." Agents create and deploy projects before any human signs up. Humans claim ownership later through signed links. This powers [tens of thousands of agent-led deployments daily](https://www.netlify.com/blog/why-netlify-is-the-1-platform-for-ai-deployments/) and grew Netlify's addressable market from 17 million JavaScript developers to 3 billion people—anyone technical enough to use a spreadsheet.

[Clerk](https://clerk.com/changelog/2025-03-7-clerk-agent-toolkit) shipped `@clerk/agent-toolkit`—adapters for Vercel AI SDK, LangChain, and MCP that let agents set up auth without dashboard configuration.

[Neon](https://neon.com) and [Prisma](https://www.prisma.io/) built database claiming flows: agents provision free-tier databases with zero setup and users claim them later. The `@netlify/neon` import provisions a database automatically—no configuration.

For our own [AnyCable+](https://plus.anycable.io) (managed WebSocket infrastructure), we're working toward this pattern. The CLI already provisions production endpoints in one command:

```sh
anycable-plus cable create my-app --public --wait

# Cable my-app is being provisioned...
# WebSocket URL   wss://my-app-sv7m.fly.dev/cable
# Broadcast URL   https://my-app-sv7m.fly.dev/_broadcast
```

Today this still requires a human signup first. Moving to create-then-claim means solving agent identity and abuse prevention—the exact challenges we cover in Part 3. We're building it, and the security considerations are real.

### Make your tool agent-usable

The difference between an agent that hallucinates your API and one that uses it correctly is context, not prompting.

**API design.** Agents parse REST with predictable HTTP verbs better than SDKs with complex abstractions. An OpenAPI spec is an agent-readable contract. A fluent builder API is a puzzle. [Zeno Rocha](https://resend.com/blog/agent-experience) (Resend) calls this token-budget-aware design.

**MCP servers** go further: they're self-describing. An agent connects, discovers what tools exist, reads their schemas, calls them—all within the protocol, no separate docs required. Netlify's server exposes functions, blob storage, identity, and deploys. [Stripe's](https://docs.stripe.com/building-with-llms) exposes API operations plus documentation search. The agent knows what it can do before it tries.

**Standalone CLIs matter too.** Not every agent has an MCP client. A CLI gives agents a scriptable entry point with no extra setup. [GitButler](https://gitbutler.com/) and [Resend](https://resend.com/) ship both—covering both integration paths.

**SDK context files** tell external coding agents which patterns to use, which to avoid, which APIs are current. Unlike `.cursorrules` in your own repo, these ship with your SDK. Netlify's `netlify recipes ai-context` CLI generates them for its core primitives.

**Agent Skills** are task-specific instruction sets. [Stripe](https://docs.stripe.com/building-with-llms) ships them for integration patterns and API upgrades (`npx skills add`). [Resend](https://resend.com/blog/agent-experience) built Email Skills that give agents deliverability expertise without bloating the context window.

Here's your test: can an external agent ship a feature with your tool using only your docs and a prompt? [Netlify tests this literally.](https://biilmann.blog/articles/ax-in-practice/) If it can't, the docs aren't good enough.

### Let agents pay

Discovery → usage → claim opens the funnel. Usage → revenue still requires a human—the agent consumes your free tier, and nothing converts consumption into payment on its own.

Three patterns are emerging.

**Human-backed metering.** Bill the human for what the agent used. [Lago](/clients/lago) (an Evil Martians client) ships [agent-ready usage billing](https://getlago.com/blog/ai-agent-monetization) on their open source Rails platform—event-level metering, 15-decimal precision for sub-cent pricing, per-token billing for Mistral and Together AI. Humans control spending, agents operate within it.

**HTTP-native micropayments.** [x402](https://www.x402.org/) revives HTTP's Payment Required status code. Agent hits your endpoint, gets 402, pays in stablecoins, retries—one line of server middleware, no accounts, no API keys, no KYC. The [Linux Foundation adopted it](https://www.linuxfoundation.org/press/linux-foundation-is-launching-the-x402-foundation-and-welcoming-the-contribution-of-the-x402-protocol) in April 2026 with Coinbase, Stripe, and Cloudflare governing. 119M transactions on Base, 35M on Solana, ~$600M annualized.

**Agent wallets with guardrails.** [Stripe's Machine Payments Protocol](https://stripe.com/) uses Shared Payment Tokens—the agent charges the buyer's card without seeing credentials. Coinbase Agentic Wallets keep agent keys in a secure enclave, with per-transaction and per-session caps as primitives. Browserbase ships per-session pricing on MPP; Parallel Web Systems ships per-API-call.

Most devtools haven't closed this loop—even agent-forward products like Neon, Clerk, and Resend still route billing through human accounts. Agent-first growth without an agent-first payment path is a party where nobody pays for drinks.

## Protect yourself from bad agents

Everything above—the MCP servers, the anonymous deploy flows, the agent-accessible APIs, the payment paths—is your growth channel. It's also your attack surface.

### The numbers

[Wallarm](https://www.wallarm.com/) audited 501 MCP servers: [96% had exploitable vulnerabilities](https://mcpjail.com/). [30+ CVEs in two months](https://www.heyuan110.com/posts/ai/2026-03-10-mcp-security-2026/), including a CVSS 9.6 RCE in a package with half a million downloads. The [OpenClaw marketplace](https://blog.cyberdesserts.com/ai-agent-security-risks/): malicious skills exfiltrating user data from 5,700+ community tools.

If you ship an MCP server for growth, you're almost certainly shipping vulnerabilities with it.

The problem is serious enough that the industry is publishing checklists: the [OWASP Top 10 for Agentic Applications](https://owasp.org/www-project-mcp-top-10/) catalogs attack vectors from goal hijacking to privilege escalation, and [Microsoft's Agent Governance Toolkit](https://github.com/microsoft/security/agent-governance-toolkit) provides policy enforcement against all ten.

{% heading level: 3 %}Agent identity is growth infrastructure{% endheading %}

[45.6% of teams](https://www.gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control) use shared API keys for agent auth. That breaks agent-led growth: you can't tell a paying customer's agent from one draining your free tier. Can't rate-limit, can't attribute usage, can't revoke one without breaking all.

You don't need enterprise infrastructure to fix this.

**Detect agents.** MCP connections are inherently agent traffic. API requests with agent User-Agent strings (`Claude`, `GPT`, `Cursor`) self-identify. Requests that skip your dashboard and go straight to the API are likely automated. Add an `actor_type` field to your auth system—`human`, `agent`, `unknown`. Everything else builds on that.

**Issue agent-specific credentials.** Don't hand agents the same API keys humans use. Create a separate key type with an agent flag—scoped down by default, tied to the human account that created it. Resend's pattern: agents create derivative keys that inherit a subset of the parent's permissions. A column in your database, not a cryptographic certificate system.

**Build progressive trust.** [Descope](https://www.descope.com/blog/post/progressive-scoping) calls this progressive scoping. In practice, it's four tiers you can ship in a week:

- **Anonymous** (MCP connection, no auth) → read-only access, aggressive rate limits. This is your free-tier growth engine.
- **Keyed** (agent API key) → can create resources, limited operations. The agent is working; the human hasn't claimed yet.
- **Claimed** (human verified the account) → full access minus destructive operations. Your conversion event.
- **Trusted** (established usage history) → destructive operations allowed with confirmation. Your retained customer.

The anonymous → claimed transition is the onboarding funnel again—identity turns it into a *metered* funnel: you know how many agent sessions convert, which tools they use, where they drop off.

[Teleport](/clients/teleport) (another Evil Martians client) calls agents a [new identity category](https://goteleport.com/blog/agentic-identity/) and builds this with cryptographic SPIFFE certificates—the right approach at scale. Their data: [over-privileged AI systems experience 4.5x more security incidents](https://goteleport.com/resources/white-papers/agentic-ai-security/). The IETF [Web Bot Auth](https://datatracker.ietf.org/group/webbotauth/about/) group is standardizing cryptographic agent identity at the HTTP layer. That's where the industry is heading. But for a seed-stage team, `actor_type` + scoped keys + four trust tiers gets you 80% of the value.

{% heading level: 3 %}Tiered feedback: help good agents, starve bad ones{% endheading %} 

A good agent that hits a generic 403 retries in a loop, wastes your resources, and tells the developer "it didn't work." The developer blames your tool. You lose a conversion.

Use the trust tiers above to decide what to tell whom. In practical terms:

**Anonymous/unknown agents** get silent blocks or generic errors. Standard security—don't educate the attacker.

**Keyed agents** get [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457) structured errors—[Cloudflare](https://blog.cloudflare.com/) already ships these. Return JSON instead of HTML:

```json
{
  "type": "rate_limit_exceeded",
  "retryable": true,
  "retry_after": 30,
  "detail": "Agent rate limit: 100 req/min per key"
}
```

For operations that need a human, return a redirect the agent can surface:

```json
{
  "type": "human_confirmation_required",
  "retryable": false,
  "approval_url": "https://yourapp.com/approve/abc123",
  "owner_action_required": "Confirm deletion in browser"
}
```

The agent shows the URL to the developer. The developer clicks, confirms, the agent continues. This is your human-in-the-loop—not a separate system, just a structured error response.

**Trusted agents** get rich feedback: "This endpoint is deprecated, use /v2/deploy instead." Retain and upgrade.

### Protect the growth channel

Three things the identity and feedback layers don't cover:

**Rate-limit by agent identity, not IP.** Your free tier is your growth engine. One compromised key behind a CDN generates requests from thousands of IPs—IP-based limiting misses it entirely. Rate-limit per `actor_type` + key, per operation class. Most API gateways (Kong, Gravitee, even Nginx with `limit_req_zone`) support key-based limiting out of the box.

**Monitor for cost inflation.** [Cost inflation attacks](https://mindgard.ai/blog/mcp-security-trends) steer agents into prolonged tool-calling chains—658x cost, 97% evasion rate. Track per-key: requests per session, average tool calls per task, total cost per key per day. Alert when any key exceeds 3x the median. You don't need an anomaly detection platform—a cron job comparing daily per-key costs against a rolling average catches the worst abuse.

**Audit your own MCP server.** You built it for growth. [96% of MCP servers have exploitable vulns.](https://mcpjail.com/) Before launch, run [MCPJail](https://mcpjail.com/) (Wallarm's open source Rust sandbox—it isolates and probes your server) or Cisco's [DefenseClaw Scanner](https://github.com/cisco/defenseclaw). Add this to CI. An MCP server is an API with a wider blast radius—treat it like one.

## AX is a discipline, not a feature

Make your product maximally accessible to agents. Assume every agent is hostile until proven otherwise. And do both, at the same time, forever. This is the same tension as "ship fast" and "don't break prod."

[AX is a discipline](https://biilmann.blog/articles/one-year-of-ax/): observation, iteration, empathy for a user that doesn't think like a human. 

UX matured from "make the button blue" to a rigorous practice. AX will mature from "add an MCP server" to a fundamental product concern. Companies that treat it as a one-time integration will be as surprised as those who treated security as a one-time audit.

[Bessemer Venture Partners](https://www.bvp.com/) (investors in Shopify, Twilio, GitLab) named AX "Law #1" in their [developer laws for the AI era](https://www.bvp.com/atlas/developer-laws-in-the-ai-era). [Sequoia](https://inferencebysequoia.substack.com/p/the-agent-economy-building-the-foundations) is betting on agents as the next distribution layer. [Daytona](https://www.daytona.io/dotfiles/agent-experience-is-the-only-experience-that-matters) (agent-native dev environments) argues AX is the *only* experience that matters.

By end of 2026, every devtools company above $5M ARR will have someone responsible for agent experience, whether they call it that or not. So, start now and compound advantages in discoverability, adoption, and trust. Wait and you'll be retrofitting for a user that's already moved on—or cleaning up after one that was never invited.
