How to add fast, client-side search to Astro static sites

October 8, 2025

Topics

Astro.js

Building static sites with Astro is a dream (especially for documentation). But what to do when your growing docs need full-text search, but you don’t want to give up that static delight? In this post, see how to bring powerful, fuzzy, and accessible search to Astro-generated sites. (This means no external crawlers and no remote APIs.) We’ll also look at the limits of AI-based and third-party search, demonstrate how to generate a build-time JSON index with Astro’s endpoints, and fine-tune the client-side experience with MiniSearch and Svelte. Here’s your guide to a native-feeling, privacy-friendly search, even on a fully static site.

Astro is an absolute unit for building static multi-page applications and I just love every minute working with it. My infinite love for it is probably a topic for a separate post. But can we just take a moment to appreciate the liberating feeling of writing those components? It’s the blissful sensation that comes when know the code will execute on the server-side, at build-time, you can depend on as many asynchronous resources as you need, and all you’ll get in the end is super-lightweight pre-built HTML with on-demand JS islands.

Even just writing this is taking me back to the moments of pure joy… But anyway, let’s not get lost in sweet nostalgia. Returning to the topic at hand: once we have a statically-generated site ready: how do we search it?

(TL;DR)

Hire Evil Martians

With open-source tools used by millions like PostCSS and Autoprefixer, we've shaped the landscape of frontend development. And we can help you create products that developers love and rely on daily, too!

Book a call

An estimate of the search situation

I had a static site with a bunch of pages generated from MDX files (using Astro content collections, of course). At some point, the collections had grown so large, it was obvious I needed to add a search to improve usability.

I quickly denied the first round of search functionality candidates (some pre-made Astro content integrations with baked-in styles and predefined UX) because I needed to fine-tune the search and fit it into our design.

Could AI do it?

A reasonable question to ask about pretty much anything in 2025: can AI do this? And, like with generated images, music, writing, voice, or video, here the answer was also: “yes, buuut…”

Let’s explain: I started looking at some third-party solutions. Since I was eventually adding an AI chatbot anyway, I thought: “Hey, why not use the same integration I was planning to use for that and have it take care of the search functionality, too?”

Well, I experimented with that …but the result wasn’t so great. To make it work, I’d have to push all our docs to a remote server so it could crawl them and then request the search results client-side, and do so separately for each query. You can imagine the impact it had on performance. Slooow!

But that wasn’t even the main concern. I still had limited search algorithm customizability, and far from ideal. For instance, I couldn’t add weights to the fields, to make the search more biased towards the headings. This meant that, in the end, search results were just not cutting it.

In the end, an AI approach failed to elegantly solve the main problem, and in fact, it seemed the factors above (sluggish UX, wrong-ish results) were making navigation harder.

Additionally, a notable consideration was that storing search data remotely disconnects it from the current project state. And while there are ways to sync it, I would still have to deal with a second source of truth.

In other words, perfect synchronization was just not there across all environments (e.g., in development), and if I added some new text, the search index wouldn’t instantly pick up those changes.

Back to search roots

It became clear that in order to get more granular search control (while also making things snappy and always a reflection of the docs’ current state), we’d have to use a good old client-side search running on a pre-generated index file.

And this made perfect sense. I already had all the site generated at build-time, now I just had to throw some JSON with our data into the mix, slap some tunable client-side search library on it… and bada-bing bada-boom …a perfectly working solution.

But wait… how to get that JSON?

Well, it shouldn’t be hard at all (this was my first thought, anyway.) I already knew it needed to happen at build time, so one could write a custom Vite plugin or an Astro integration that parsed all our Markdown content to a JSON file and saves it to the public directory.

Easy, right? Wrong.

As quickly discovered, the astro:content collection’s context is not yet available at the plugin stage, so there’s no way to leverage its built-in methods to load our documents. (Sure, there was the option of loading and parsing them manually, but doing the same task twice two different ways really didn’t feel like the right solution.)

Luckily, there was a better way.

Astro allows us to create endpoints, which in server-mode are called live on a per-request basis. But what happens to them in SSG mode? Well, they’re called at build time and generate static data. That’s just what I needed!

All it takes is an export async function GET(). Here, you can expect astro:content to be fully supported. So, with some sleight of hand, we loaded our collections’ data, and returned it in a format suitable for search indexing:

import { getCollection } from "astro:content"

export async function GET() {
 const pages = await getCollection("docs")

 const index = await Promise.all(
   pages.map(async (page) => {
     return {
       `slug: `/${page.id}`,`
       title: page.data.title,
       body: text
     }
   })
 )

 return new Response(JSON.stringify(index))
}

Stripping the extra stuff

A critical part of this setup was also to process each document’s body from MDX to plain text. This is to avoid any weird tags and symbols popping up in our search results. The process meant removing all the JSX expressions, custom components, tables syntax, links, and so on.

The cleanest way to do this is to parse each document to AST from MDX with unified and some remark plugins.

First, I created a couple of utils to process the mdx.

This is a crucial part of the process when you’re handling MD(X)-based content and want to search it, so I’m going to give you all the code you’ll need for it below.

Let’s begin with a utility that parses raw MD to mdast, then hast (since it handles spacing better when compiled to plain text). Finally, we’ll parse it into plain text, stripping all the extra stuff along the way:

import { unified } from "unified"
import remarkParse from "remark-parse"
import remarkMdx from "remark-mdx"
import remarkRehype from "remark-rehype"
import remarkGfm from "remark-gfm"
import { toText } from "hast-util-to-text"

export const mdxToPlainText = async (mdx: string) => {
  const processor = unified().use(remarkParse).use(remarkMdx).use(remarkGfm)

  const mdast = processor.parse(mdx)

  stripJsx(mdast)

  const hast = await unified().use(remarkRehype).run(mdast)

  const text = toText(hast)

  return { text, mdast, hast }
};

This next one is called in the function above after (and is actually the one that gets rid of all the MDX-JSX):

import { visit, SKIP } from "unist-util-visit"
import type { Node, Parent } from "unist"

export const stripJsx = (ast: Node) => {
  visit(
    ast,
    [
      "mdxJsxFlowElement",
      "mdxJsxTextElement",
      "mdxjsEsm",
      "mdxFlowExpression",
      "mdxTextExpression"
    ],
    (node, index, parent) => {
      if (!parent) {
        return;
      }

      const parentNode = parent as Parent
      const nodeChildren = (node as Parent).children || []

      // Replace jsx node with its children (if any),
      // if not-just remove the node.
      parentNode.children.splice(index, 1, ...nodeChildren)
      return [SKIP, index]
    }
  )
}

And then, while I was at it, I took the opportunity to extract all headings into a separate array so that I could add some weight to them later in search:

import { visit } from "unist-util-visit";
import type { Node } from "unist";
import { toString } from "mdast-util-to-string";

export const getHeadings = (ast: Node) => {
  const headings: string[] = [];

  visit(ast, "heading", (node: Element) => {
    headings.push(toString(node));
  });

  return headings;
};

Once those were in place, I hooked them into the search-index endpoint. In the end, I had all my data parsed into a perfectly indexable format:

export async function GET() {
  const pages = await getCollection("handbook");

  const index = await Promise.all(
    pages.map(async (page) => {
+     const { text, mdast } = await mdxToPlainText(page.body || "");
+     const headings = getHeadings(mdast);

      return {
        `slug: `/${page.id}`,`
        title: page.data.title,
+       headings,
        body: text,
      } satisfies SearchIndexItem;
    }),
  );

  return new Response(JSON.stringify(index));
}

Client-side search

With the search index taken care of, it was time to connect it with the client and start searching!

After a couple of hours creating the UI with Svelte and fetching the static search index I prepared in the previous step, I was ready to connect it with a library that would do the heavy-lifting.

<script>
let query = $state('')

let searchIndex = {}

let isInit = false
const init = async () => {
  if (isInit) return
  isInit = true

  // On first focus of input,
  // load the JSON with our search index (prepared at build time)
  // and store it client-side in memory.
  try {
    const response = await fetch('/search-index.json')
    const body = await response.json()
    searchIndex = body
  } catch (err) {
    console.error('Error while initializing search', err)
  }
}
</script>

<input
  onfocus={init}
  bind:value={query}
  class="search-input"
  type="text"
  placeholder="Search"
/>

I investigated (and tried) a couple of open-source libraries designed for searching datasets. I liked minisearch more than the others as it ticked all the perfect candidate boxes:

Full-text search — crucial for the docs-style nature of our site, it should be able to find each word on each page.
Boost fields — I wanted some fields to have preference over the others, like so: title > headings > body.
Fuzzy on demand — at this point I wasn’t sure if we would end up using fuzzy to allow typos or not for more precise results, but I wanted it as an option.
Memory-conscious — we’re dealing with some pretty long data here, so I didn’t want to keep it in memory more than necessary.

minisearch it was! I first initialized the search engine with my index JSON and tweaked it a bit with these options:

let searchIndex = {}

const miniSearch = new MiniSearch({
  fields: ['title', 'headings', 'body'],
  idField: 'slug',
  storeFields: ['title', 'headings', 'body', 'slug'],
  searchOptions: {
    prefix: true,
    boost: {
      title: 30,
      headings: 20
    },
    combineWith: 'and',
    fuzzy: true
  }
})

Then, I configured the reactive effects, aaand… now every time I typed something in the search input, I got an array of relevant results. Ready for digestion, but not quite ready for rendering:

// Bound to search input value.
let query = $state('')

// Holds results for each query.
let searchResults = $state([])

// Trim query to ignore whitespace.
let trimmedQuery = $derived(query.trim())

// I didn't want to run queries for input shorter than 3 symbols,
// because that's not enough length to match meaningful results.
const MIN_QUERY_LENGTH = 3

$effect(() => {
 if (trimmedQuery.length < MIN_QUERY_LENGTH) {
   searchResults = []
   return
 }
 searchResults = miniSearch.search(trimmedQuery) as SearchResultWithFields[]
})

let isInit = false
const init = async () => {
  if (isInit) return
  isInit = true

  try {
    const response = await fetch('/search-index.json')
    const body = await response.json()
    miniSearch.addAll(body)
  } catch (err) {
    console.error('Error while initializing search', err)
  }
}

minisearch was doing a great job at handling queries and selecting the right documents from the dataset, but it was on me to transform the results to a renderable format.

I wanted to show page titles (a relevant heading if there was a match there) and the part of the body where the best match occurred. I also wanted to highlight the matched part in all of the results.

Having fuzzy search enabled can make the whole task of highlighting a bit tricky—as the match is not necessarily the same as the query. But luckily, minisearch returns the array of terms that actually matched in each result. I was able to use this to generate a list of possible phrases, starting from the most optimistic one, and gradually descending to the shortest and least appealing. This meant I was able to precisely slice those parts of the text containing the best matches.

I had some algorithms at hand that took in raw search results and did all the parsing, slicing, truncating, and highlighting. So a few moments later I had my returned list, perfectly ready to render:

let parsedResults = $derived(parseSearchResults(searchResults))

With this in place, I was ready to add search results to the template:

<ul class="search-results-list">
 {#each parsedResults as result, index}
   <li class="search-result-li">
     <a class="search-result-link" href={result.slug} tabindex={-1}>
       <strong>
         <HighlightChunks text={result.title} />
       </strong>
       <span>
         <HighlightChunks text={result.heading.length ? result.heading : result.body} />
       </span>
     </a>
   </li>
 {/each}
</ul>

Accessibility

Since searching involves interacting with a text input and selecting the result from a dropdown list, we can’t forget about making it accessible and convenient via keyboard.

To accomplish this, I tracked the currently selected item’s index in the array of results and assigned an update of that index for both mouse events and up and down arrows—only when the input is focused!

I kept the input focused throughout the arrow-selection action, to allow users to continue typing at any moment as they scroll through the results.

We also wanted to add a shortcut to initiate search. Conventionally, ⌘K (or Ctrl K on Windows) was the first thing that came to mind. I used Evil Martians’ KeyUX, which is an awesome open source library that covers pretty much everything you can think of in terms of keyboard navigation and is a worthy addition to every frontend developer’s a11y toolkit.

KeyUX can be initialized globally directly in our main Astro layout component, so that we can use it later in other parts of our app.

I also included a Mac compatibility extension to automatically take care of any Cmd/Ctrl business:

<script>
  import { startKeyUX, hotkeyKeyUX, hotkeyMacCompat } from "keyux";

  const mac = hotkeyMacCompat();
  startKeyUX(window, [hotkeyKeyUX([mac])]);
</script>

And then, back in our search component, all you need to do is simply add an aria-keyshortcuts attribute with our hotkey of choice:

<input
  onfocus={init}
  bind:value={query}
  class="search-input"
  type="text"
  placeholder="Search"
+ aria-keyshortcuts="ctrl+k"
/>

TL;DR

So, what are the key takeaways here? Let’s sum things up:

Astro is awesome (OK, OK, moving on…)
Remote and crawler-based search solutions are not as performant as we wish they were. They add a second source of truth that can be tricky to keep in sync, especially across environments.
For smaller or medium-size doc sites, there is a better alternative in generating a static search index at build time (leveraging Astro’s endpoints).
Parse the documents to remove Markdown syntax from search results.
Use a pre-generated index with a classic client-side search for the snappiest and most configurable experience.
Further improve the UX by additionally parsing raw search results before rendering to highlight the best matches; this is so that users understand why each result was suggested.
Can’t forget about a11y!

So, hopefully this post will give you the power to go ahead and implement something that’s been sitting in your backlog for way too long and add search to your static docs site!