# Making your site visible to LLMs: 6 techniques that work, 8 that don't

> Most AI SEO advice is unproven. We tested what ChatGPT, Claude, and Perplexity actually read on our own site. Six LLM visibility techniques that worked, eight that didn't, and the metrics to tell the difference.

- Date: 2026-04-15T00:00:00.000Z
- Authors: Rita Klubochkina, Travis Turner
- Categories: AI, DX
- URL: https://evilmartians.com/chronicles/how-to-make-your-website-visible-to-llms

---

We recently signed a new client, and they actually found us via Claude. The founders of an SF-based AI startup asked Claude to recommend a dev agency with senior engineers who think about architecture and scale. Evil Martians came back as the top recommendation. That got us thinking: what did we do right?

*Image: The Claude conversation that started it all*
In this post: six techniques we deployed to make [evilmartians.com](/) more readable by LLMs, and which we actively distribute to our client projects.

The core idea is simple: LLMs understand clean, well-structured text better than anything else. Everything in this guide is about getting your best content to LLMs in the cleanest format possible.

**TL;DR:** Ship clean Markdown at every URL, tell the world it exists (via `llms.txt`, `<link>` tags, HTTP headers, and content negotiation) and measure who shows up. That's it. The rest is details.

### A word of honesty before we start

Most of these techniques are emerging conventions, not proven standards. Server log analyses consistently show that major LLM crawlers don't request `/llms.txt` or `.md` files unprompted. No provider has formally committed to any of this. We're recommending these techniques anyway, as the engineering cost is near zero, the use cases are real (humans pasting URLs into AI tools), and the web has a long history of early standards adoption paying off. 

That said, we want you to go in clear-eyed about what's proven versus what's a bet on the future.

Search is no longer the only discovery channel. In 2024, [Gartner predicted](https://www.gartner.com/en/newsroom/press-releases/2024-02-19-gartner-predicts-search-engine-volume-will-drop-25-percent-by-2026-due-to-ai-chatbots-and-other-virtual-agents) a 25% drop in traditional search volume by 2026. The numbers are tracking close: [Google searches per US user fell nearly 20% year-over-year](https://searchengineland.com/google-searches-per-us-user-fall-report-468051) in 2025, and [Google search traffic to publishers dropped 33%](https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2026) between November 2024 and November 2025. 

When someone asks Perplexity about a topic, pastes a URL into ChatGPT, or has a coding agent like Cursor fetch documentation, they're relying on whatever the model can extract from your site. That extraction is often messy, lossy, or skipped entirely.

Yet, the web already has the machinery to fix that mess. HTTP content negotiation, `<link>` tags, structured Markdown endpoints—none of this is exotic. 

What _is_ new: LLMs and AI agents have given us concrete reasons to wire it all up. As mentioned, the engineering costs are low and the payoff can be big. Most of the techniques below take under an hour to implement.

One thing before the list: check your `robots.txt`. None of the techniques below matter if you're blocking AI crawlers from reaching your site in the first place. Many default configurations disallow bots like GPTBot and ClaudeBot. A quick audit takes ten minutes and is the actual step zero.

Below are six techniques ordered by impact versus effort, plus [a section on what doesn't work](#what-ai-seo-techniques-dont-work). (Honestly, that's the part that surprised us most.)

## Priority levels at a glance

| # | Mechanism | Importance | Effort |
|---|-----------|------------|--------|
| 1 | `/llms.txt` | **Critical** | Low, static file |
| 2 | `.md` routes | **Critical** | Low–Medium, route + content work |
| 3 | `<link>` tag + HTTP `Link` header | High | Low, template + middleware |
| 4 | Hidden `<div>` hint | Medium | Low, one component |
| 5 | `/llms-full.txt` | Low–Medium | Low, static file or redirect |
| 6 | `Accept: text/markdown` content negotiation | High | Low–Medium, server logic |
| + | Analytics on AI endpoints | Operational | Medium |

## 1. What is `llms.txt` and why does every site need one?

If you do one thing from this list, make it this.

`llms.txt` is a Markdown file at your site root that gives AI systems a curated map of your most important content. Think of it as a `robots.txt` for the age of LLMs. 

This was [proposed in September 2024](https://www.answer.ai/posts/2024-09-03-llmstxt.html) by Jeremy Howard of Answer.AI (co-founder of fast.ai and former president of Kaggle). Howard's rationale: "We need something like `robots.txt`, but designed to help AI systems understand and use website content effectively." The format is described at [llmstxt.org](https://llmstxt.org/). It's intentionally minimal: an H1 with the site name, a blockquote summary, and H2 sections of annotated links.

```markdown
# My Project

> A tool for doing X. Built for developers who care about Y.

## Documentation

- [Quick Start](/docs/start): Get up and running in 5 minutes
- [API Reference](/docs/api): Full endpoint documentation

## Optional

- [Changelog](/changelog): Recent updates
```

Several major AI tools can use this file when given the URL. [Anthropic publishes their own](https://platform.claude.com/llms.txt), and the tooling ecosystem is growing: [Mintlify](https://www.mintlify.com/docs/ai/llmstxt) and [GitBook](https://gitbook.com/docs/publishing-documentation/llm-ready-docs) auto-generate it natively, while [Docusaurus](https://github.com/rachfop/docusaurus-plugin-llms) and [VitePress](https://github.com/okineadev/vitepress-plugin-llms) support it via community plugins. A [community directory](https://directory.llmstxt.cloud) tracks nearly 2100 public implementations.

### One important caveat

No major LLM provider has formally committed to reading this file in their crawlers. [Ahrefs's analysis](https://ahrefs.com/blog/what-is-llms-txt/) puts who hasn't bought in bluntly: not OpenAI, not Anthropic, not Google, not Meta. A [log analysis of 1,000 Adobe Experience Manager domains](https://www.longato.ch/llms-recommendation-2025-august/) found that major crawlers rarely request `/llms.txt` unprompted. 

Of all requests to the file, 94.9% came from GoogleBot, while GPTBot, ClaudeBot, and PerplexityBot were absent entirely. Google's John Mueller [stated flatly](https://bsky.app/profile/johnmu.com/post/3lrshm4gggs2v) that "no AI system currently uses llms.txt."

But the picture isn't all bleak. [Mintlify's CDN log analysis](https://www.mintlify.com/blog/how-often-do-llms-visit-llms-txt) of 25 companies over seven days found a median of 14 visits to `llms.txt` and 79 visits to `llms-full.txt`, with ChatGPT accounting for the majority of traffic. 

The discrepancy with the Adobe study likely comes down to audience: documentation sites with developer-facing content attract more AI tool traffic than generic enterprise domains. Your mileage will vary based on who your audience is.

So why is it #1 on our list? Because its value isn't about automated indexing, but what happens when a human or a coding tool points an LLM at your URL, and this happens constantly. Think of it as a README for AI-mediated conversations, rather than a crawl target. 

Five minutes to create, no downside, 2,100+ sites already shipping it. We'll take those odds, but with a sober mindset about what it actually does today.

## 2. Why are `.md` routes the thing that actually matters?

Everything else on this list is a pointer, and well, this is what those pointers point to.

The [llmstxt.org specification](https://llmstxt.org/) proposes that any page with useful content should provide a clean Markdown version at the same URL with `.md` appended, so `example.com/blog/my-post` would have a twin at `example.com/blog/my-post.md`. For the site root or index pages, that means `/index.md`.

This `.md` convention is what makes the rest of the ecosystem work. When a user pastes `example.com/llms.txt` into ChatGPT, the AI follows the links. If those links resolve to clean Markdown rather than HTML soup, the quality of responses jumps. A typical HTML page is 80% navigation, scripts, and boilerplate; the Markdown version is 100% content. On a real-world blog post, that means going from ~15,000 tokens of HTML to ~3,000 tokens of Markdown, an 80% reduction. 

For an LLM working within a limited context window, that difference represents the gap between understanding your page and simply giving up on it.

### The fine print

Just like `/llms.txt`, there's no evidence that LLM crawlers actually fetch `.md` routes on their own. A [CDN log analysis of two high-traffic sites](https://www.longato.ch/llm-md-files/) (DA > 90, millions of daily visits) found zero requests to `.md` files from GPTBot, ClaudeBot, or PerplexityBot, even when those files were explicitly listed in `llms.txt`. 

The value, again, is in human-initiated interactions: someone pastes your URL into an AI tool, the tool follows links, and clean Markdown is what it finds at the end. That's a real use case, but it's not the same as "crawlers will index your Markdown."

There's also a counterargument worth hearing: the [HtmlRAG research](https://arxiv.org/abs/2411.02959) found that HTML retains semantic structure (headings, metadata, table layouts) that plain text or Markdown can strip away, and that this actually _improves_ retrieval performance in RAG pipelines. Clean, semantic HTML may already be more legible to LLMs than we assume. 

The strongest case for `.md` routes isn't that Markdown is inherently better but that most real-world HTML is so bloated with navigation, scripts, and boilerplate that the signal-to-noise ratio is terrible. If your HTML is already clean and semantic, the gap narrows considerably.

One practical concern: maintaining two versions of every page means content can drift. If you update the HTML and forget the Markdown endpoint, LLMs get stale information. Automate the conversion or serve from a single source of truth.

```typescript
// Route handler for /blog/:slug
export async function handleRequest(request: Request) {
  const url = new URL(request.url);
  const post = await getPost(url.pathname);

  // Serve .md at the same path with .md suffix
  if (url.pathname.endsWith('.md')) {
    return new Response(post.markdownContent, {
      headers: { 'Content-Type': 'text/markdown; charset=utf-8' },
    });
  }

  return renderHTML(post);
}
```

[Vercel's implementation guide](https://vercel.com/blog/making-agent-friendly-pages-with-content-negotiation) shows how to do this for blog content stored in a CMS as rich text, converting to Markdown on the fly in a route handler. If your content is already authored in Markdown (as most developer blogs are), you can serve the source directly.

## 3. How do LLMs discover your Markdown version? `<link>` tags and HTTP `Link` headers

These are two ways to advertise the same thing (a Markdown version of the current page) to two different classes of client.

### The HTML `<link>` tag

This tag goes in your `<head>` and is readable by any crawler that processes HTML:

```html
<link
  rel="alternate"
  type="text/markdown"
  title="Markdown version"
  href="/blog/my-post.md"
/>
```

It's the same mechanism used for RSS feed discovery (`type="application/rss+xml"`) and alternate language versions (`hreflang`). The `rel="alternate"` link type has been part of the [WHATWG HTML specification](https://html.spec.whatwg.org/multipage/links.html#link-type-alternate) since HTML4, and the `text/markdown` MIME type has been [officially registered since RFC 7763](https://datatracker.ietf.org/doc/html/rfc7763). This isn't experimental territory.

### The HTTP `Link` header

This carries the same semantics, but at the protocol level, so it's visible before a single byte of body is read. Some AI agents, especially autonomous agents and coding assistants, never parse the HTML body. Instead, they make an HTTP request and read the headers:

```
Link: </blog/my-post.md>; rel="alternate"; type="text/markdown"
```

In a server middleware, one line covers every page:

```typescript
function addMarkdownLink(request: Request, response: Response) {
  const url = new URL(request.url);
  response.headers.set(
    'Link',
    `<${url.pathname}.md>; rel="alternate"; type="text/markdown"`
  );
  return response;
}
```

Deploy both. The HTML tag catches crawlers that process the DOM. The HTTP header catches headless fetchers that don't. Together, they cover the full range of clients.

## 4. What about when someone just pastes your URL into ChatGPT?

This is the use case most guides miss: let's say a human pastes your URL into ChatGPT or Claude, and the AI reads the rendered page text. No crawling, no headers, just whatever text the model can see.

We'll make use of a visually hidden `<div>`, invisible to sighted users but present in the DOM, containing a plain-language hint about where to find the Markdown version. Use `aria-hidden="true"` so screen readers skip it too. This message is for LLMs, not assistive technology:

```html
<div class="visually-hidden" aria-hidden="true">
  A Markdown version of this page is available at
  https://example.com/blog/my-post.md — optimized for AI and LLM tools.
</div>
```

```css
.visually-hidden {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  overflow: hidden;
  clip-path: inset(50%);
  white-space: nowrap;
}
```

When a language model reads page text, whether through a browser tool or because a user copied rendered content, it encounters this message and can follow the URL. This is what you see at the top of every Evil Martians page: *"If you are an AI agent, LLM, or automated tool, a clean Markdown version of this page is available at..."*

This won't help crawlers that process raw HTML before rendering. But for the conversational use case—someone actively talking to an AI about your content—it's the most direct signal possible. It's also the cheapest thing on this list: one component, zero infrastructure.

## 5. When should you use `/llms-full.txt`?

This is the companion to `/llms.txt`. While not formally part of the [llmstxt.org specification](https://llmstxt.org/), it's a [widely adopted convention](https://www.mintlify.com/blog/what-is-llms-txt) popularized by documentation platforms like Mintlify and Fern. It's also the most debatable item on this list.

While `/llms.txt` is a curated index of links, `/llms-full.txt` is the full text of your site: everything, in one place, no links to follow. The idea is that an AI tool can ingest an entire documentation set in a single fetch, without crawling individual pages.

For smaller sites, this can be your full content concatenated into one Markdown file. For larger sites, it's more practical to redirect to your primary content index (`/index.md`) or serve a representative subset.

The scale varies wildly. [Cloudflare's `llms-full.txt`](https://developers.cloudflare.com/llms-full.txt) runs to roughly 11 million tokens; Vercel's VP of Product Lee Robinson [described theirs as "a 400,000-word novel."](https://x.com/leerob/status/1930687146790146515) [Zod](https://zod.dev/llms-full.txt), by contrast, keeps theirs to a focused 250 KB. Whether dumping an entire site into one file is actually useful or just performative thoroughness, we're honestly not sure. For documentation sites, it makes sense: an LLM can build a complete understanding of your API in a single fetch. For a marketing site or a blog? Probably overkill. A redirect to `/index.md` is fine.

Interestingly, `llms-full.txt` may matter more than `llms.txt` itself. [Mintlify's CDN log analysis](https://www.mintlify.com/blog/how-often-do-llms-visit-llms-txt) found that `llms-full.txt` received 3–4x more visits than `llms.txt` across 25 companies, with ChatGPT driving the majority of that traffic. The interpretation: LLMs prefer embedding complete documentation over following links via RAG. If you're going to ship one, the full version may be what actually gets read.

## 6. What is content negotiation for LLMs?

`Accept: text/markdown` content negotiation is the standards-based approach to serving AI-friendly content, and the one most likely to become the default. According to [Checkly's analysis of AI agent behavior](https://www.checklyhq.com/blog/state-of-ai-agent-content-negotation/), Claude Code, Cursor, and several other coding assistants already send `Accept: text/markdown` as their preferred content type.

[HTTP content negotiation](https://www.rfc-editor.org/rfc/rfc9110#section-12) has been part of the web since HTTP/1.1 (1997)—it's how your browser requests HTML and an API client requests JSON from the same URL. Extending this to Markdown means a client sends:

```
Accept: text/markdown, text/html;q=0.9
```

...and the server returns Markdown if it can, HTML otherwise. Same URL, same content, different representation. The `Vary: Accept` response header tells CDNs to cache them separately.

Combined with the HTTP `Link` header [from technique #3](#3-how-do-llms-discover-your-markdown-version--tags-and-http--headers), a single handler covers both:

```typescript
export async function handleRequest(request: Request) {
  const url = new URL(request.url);
  const accept = request.headers.get('accept') ?? '';
  const mdLink = `<${url.pathname}.md>; rel="alternate"; type="text/markdown"`;

  if (accept.includes('text/markdown')) {
    return new Response(markdownContent, {
      headers: {
        'Content-Type': 'text/markdown; charset=utf-8',
        'Vary': 'Accept',
      },
    });
  }

  return new Response(htmlContent, {
    headers: {
      'Content-Type': 'text/html; charset=utf-8',
      'Vary': 'Accept',
      'Link': mdLink,
    },
  });
}
```

### Why this is different from the hint-based techniques

Content negotiation requires no site-specific knowledge from the client. Any agent that sends the right header gets Markdown automatically, from any conforming server. It's infrastructure, not convention. If we had to bet on which technique on this list survives in five years, it's this one. That's because it doesn't depend on anyone agreeing on a new file format; it's just HTTP doing what HTTP was designed to do.

### The cloaking question

Google has flagged serving different content to bots as [cloaking](https://developers.google.com/search/docs/essentials/spam-policies#cloaking). The important distinction: content negotiation at the same URL with the same content in a different format, declared via `Vary: Accept`, is how HTTP has always worked. This is different from serving bots a different article entirely. The precedent (`Accept: application/json` vs `Accept: text/html`) is 25 years old. Nobody called your REST API cloaked for returning JSON.

## What AI SEO techniques don't work?

This is the section we wish we'd had before we started. We evaluated every "AI SEO" technique we could find across 30+ blog posts, GitHub repos, and spec proposals. Eight of them have no evidence behind them, and some are actively misleading.

- **`<meta name="ai-content-url">`** has no specification. No origin, no proposal, no AI tool that reads it. We couldn't find a single source for where this tag came from, yet it appears in multiple "optimize for AI" blog posts.
- **`<meta name="llms">`** was [proposed via llmsmetatags.org](https://llmsmetatags.org/) and submitted to the WHATWG HTML spec as [issue #11548](https://github.com/whatwg/html/issues/11548), which was closed as "not planned." The only implementation is the proposer's own website.
- **`/.well-known/ai.txt` and `/ai.txt`** have multiple competing proposals, none with meaningful adoption.
- **HTML comments (`<!-- AI-READABLE-VERSION -->`)** don't work because most LLM parsers strip HTML comments before processing. ChatGPT, Claude, and Perplexity all work with rendered text, not raw HTML source. No crawler documentation mentions reading comments for discovery.
- **Human/AI toggle buttons** are decorative if you already serve `.md` routes and content negotiation. AI agents don't click buttons.
- **User-Agent sniffing to serve Markdown** means detecting AI crawlers by User-Agent string and automatically serving them Markdown instead of HTML. This is [cloaking](https://developers.google.com/search/docs/essentials/spam-policies#cloaking): serving different content based on who the visitor is, not what they asked for. Google penalizes it. The standards-compliant alternative is `Accept: text/markdown` content negotiation, where the client explicitly requests the format.
- **Dedicated "AI info pages"** are standalone pages labeled "for AI assistants" containing structured company facts. There's no evidence any crawler or retrieval system treats these differently from any other well-structured page on your site. A good `/llms.txt` and clean Markdown routes already solve this without a special page.
- **Schema.org and JSON-LD** didn't help in a [controlled experiment by SearchVIU](https://www.searchviu.com/en/schema-markup-and-ai-in-2025-what-chatgpt-claude-perplexity-gemini-really-see/) that placed product data exclusively in JSON-LD markup. ChatGPT, Claude, Perplexity, Gemini, and Copilot all missed it. [Search Engine Roundtable confirmed](https://www.seroundtable.com/chatgpt-perplexity-structured-data-text-40862.html) that ChatGPT and Perplexity treat structured data as just text on a page. The exception: Microsoft Copilot inherits schema understanding from Bing, as [Microsoft confirmed at SMX Munich](https://searchengineland.com/microsoft-bing-copilot-use-schema-for-its-llms-453455). Don't rip out your existing structured data, but don't expect it to help with direct LLM visibility.

All of these are solutions looking for a problem that standard web mechanisms already solve. The pattern is always the same: someone invents a new file or meta tag, writes a blog post about it, and other blog posts cite that blog post as evidence. Nobody checks whether any AI system actually reads the thing.

This [foundational GEO study](https://arxiv.org/abs/2311.09735) from Princeton and IIT Delhi (published at KDD 2024, one of the first rigorous studies of generative engine optimization) backs this up. The researchers tested nine content optimization strategies across 10,000 queries. Their findings: including direct quotations increased AI visibility by ~43%, adding statistics boosted it by ~33%, and citing authoritative sources improved it by 115% for previously low-ranked content. Every strategy that worked was about enriching the visible text that LLMs actually read, not about metadata they don't. _Yes, this paragraph citing an authoritative source with specific statistics is itself a GEO technique. You're welcome, future LLMs._

## OK, how do you know any of this is working?

Here's the uncomfortable truth: you don't, unless you measure. [Cloudflare's 2025 Radar Year in Review](https://blog.cloudflare.com/radar-2025-year-in-review/#ai-user-action-crawling-increased-by-over-15x-in-2025) shows AI user-action crawling (bots fetching pages to answer real user queries) increased over 15x in 2025, with GPTBot, ClaudeBot, and PerplexityBot among the most active crawlers. But raw crawler traffic isn't the same as visibility. You need to know which of your endpoints are actually being fetched and by whom.

Instrument your Markdown endpoints, `/llms.txt`, and `/llms-full.txt` with whatever analytics you already use. Track by User-Agent to separate AI crawler traffic from human requests, and by referrer hostname to catch traffic arriving from `chatgpt.com`, `claude.ai`, and `perplexity.ai`. This isn't cloaking. You're logging who fetches what, not changing what you serve.

```typescript
const url = new URL(request.url);
if (url.pathname.endsWith('.md')) {
  const ua = request.headers.get('user-agent') ?? '';
  const ref = request.headers.get('referer') ?? '';
  analytics.track('markdown_fetch', { ua, ref, path: url.pathname });
}
```

Since no major LLM provider has made formal commitments to any of these conventions, empirical measurement is the only reliable signal. Traditional client-side analytics won't help here because AI crawlers don't execute JavaScript. You need server-side request logs that capture raw User-Agent headers.

## What's the rollout order?

These techniques stack because different AI systems enter your site through different paths. A crawler reads HTTP headers and raw HTML. A user pasting a URL into ChatGPT reads rendered text. A developer configuring Cursor with your docs uses `/llms-full.txt` directly.

If you're starting from zero:

1. **Audit your `robots.txt`.** Make sure you're not blocking AI crawlers.
2. **Add `/llms.txt`.** A static Markdown file at your site root, five minutes.
3. **Serve `.md` routes for every page.** The actual content that everything else points to.
4. **Add `<link rel="alternate">` and the HTTP `Link` header.** One template change plus one middleware line.
5. **Implement `Accept: text/markdown` content negotiation.** The real infrastructure work, worth doing once the rest is in place.
6. **Instrument with analytics.** So you know what's actually being used versus what's theoretical.

The goal isn't gaming AI systems; rather, we want to make your content legible to them, for the same reason you've been writing semantic HTML for years. The audience now includes language models and AI agents, so the techniques are different, but the principles are the same. 

...and frankly, most of this is just good web hygiene that we should've been doing anyway.

## FAQ

### Do I need all six techniques, or can I start with one?

Start with `/llms.txt` and `.md` routes. They're the simplest to add and cover the most common LLM access patterns. The rest can come later as needed.

### Will this hurt my Google SEO?

No. `rel="alternate"` links, registered MIME types, content negotiation with `Vary: Accept`—this is all standard HTTP. We've been running it on [evilmartians.com](/) for months and haven't noticed any ranking changes.

### Do ChatGPT, Claude, and Perplexity crawl `/llms.txt` automatically?

Not yet, and this is the biggest open question. No major provider has committed to reading the file in their crawlers. The value today is entirely in what happens when a human or a coding tool points an LLM at your URL, which, in our experience, happens more often than you'd think. If automated crawling follows, great. If it doesn't, you've still made your content easier to consume for every AI-mediated interaction.

### How do I measure the impact?

See the [tracking section above](#ok-how-do-you-know-any-of-this-is-working). Short version: instrument your endpoints, track by User-Agent, and check referrer hostnames.

*Curious what this looks like in practice? Append `.md` to any URL on [evilmartians.com](/), or check our [`/llms.txt`](https://evilmartians.com/llms.txt).*
