What is LLMs.txt? The Ultimate Guide

Sarah Chen

Product Development

AI models like ChatGPT, Claude, and Copilot are increasingly relying on website text for information. That’s true for both real-time searches (Search the web for) and bulk database training. LLMs scrape websites for content and then often share that in full in AI overviews, prompt responses, and research. 

Unfortunately, LLMs aren’t actually very good at navigating most websites.  Most importantly, they lack the context to navigate pages, ads, and external links, meaning generated content can be imprecise and even wrong. 

LLMs.txt is a proposed standard to help large language models (AI) navigate websites. This file is meant to add to existing standards like Robots.txt and Sitemap.xml. Unlike either of these files, LLMs.txt is meant to be written in LLM-friendly Markdown. In addition, it’s meant to share context as well as navigation data. That creates a unique opportunity for you to control the context in which LLMs scrape your website and use its content. 

That’s the short answer! But keep reading to learn about how to implement LLMs.txt into your own website and, more importantly, whether you should bother at all. 


What is llms.txt

LLMs.txt is a markdown file used to help LLMs like Gemini, Claude, ChatGPT, and Copilot to navigate your website and its content. The idea is to add context to search and site structure, so LLMs can navigate your website on demand to answer user prompts. 

It’s also similar to robots.txt and sitemap.xml but performs a different function. 

  • Create a structured overview of website content, so AI crawlers can navigate your site and parse it to answer queries

  • Create a contextual map so AI crawlers can respond to search requests quickly 

  • Define the contexts you’d like AI models to use when answering questions about your content and brand 

  • Allows AI crawlers to overcome context limitations and better access your site 

The LLMs.txt proposal is about showing context to LLMs using a markdown format, so they can consume that content. That means no ads, no site navigation, just content with context. That circumvents, for example, the nightmare in which a prospective customer asks ChatGPT what you’re offering and it explains ad content instead. 

LLMs.txt vs Robots.txt vs Sitemap.xml

LLMs.txt

Robots.txt

Sitemap.xml

Human Readable 

Includes pages that are not human readable 

Human Readable

Navigable by LLMs

Navigable by LLMs

Not always navigable by LLMs

May include external links 

Doesn’t include external links 

Doesn’t include external links 

Only includes information relevant for context 

Includes full documents 

Includes full documents 

Helps search engine crawlers understand content

Controls search engine crawler access 

Lists all indexable pages with no context 

Why use LLMs.txt

For LLMs

LLMs.txt files give LLMs a context-driven way to parse and navigate your website. That won’t matter as much for LLMs like Claude, which can navigate XML files and therefore your sitemap. However, by adding context as well as navigation, you control how content is processed, mapped, and used. 

For Users

You can think of your LLMs.txt file as a context-driven guide to your website, documentation, and products. Rather than simply functioning for LLMs, they can guide developers through software documentation, create opportunities for defining business structure, and context. Of course, it’s still a wall of text, so it might be challenging for human readers to navigate as well. 

Should You Use LLMs.txt? 

For now, it’s not actually that big of a deal if you use LLMs.txt or not. Why? AI services aren’t actually using them yet. In fact, if you look at the database of websites adding LLMs.txt to their site, the numbers aren’t even in the thousands yet. That means it doesn’t even make sense for AI services to start looking for those files yet. 

In one Reddit conversation here, the OP asked “I've submitted to my blog's root an LLM.txt file earlier this month, but I can't see any impact yet on my crawl logs. Just curious to know if anyone had a tracking system in place, or just if you picked up on anything going on following the implementation.

If you haven't implemented it yet, I am curious to hear your thoughts on that.”

John Mueller, Google’s search advocate, pitched into the conversation as well: 

“AFAIK none of the AI services have said they’re using LLMs.TXT (and you can tell when you look at your server logs that they don’t even check for it). To me, it’s comparable to the keywords meta tag – this is what a site-owner claims their site is about … (Is the site really like that? well, you can check it. At that point, why not just check the site directly?)”

In addition, if an LLM page decides to point to your LLMs.txt file as a source, that means the user that clicks through is going to have a harder time navigating your website. Of course, most LLM users aren’t asking questions to go on to click through to the website, so this may not be a concern at all.

How to install llms.txt on Your Website 

Markdown is the most widely and easily understood language for LLMs. By providing a Markdown file, you give AI traffic an easy way to structure your site and find the information it needs. 

You’ll want to create a root path of /llms.txt or a subpath /llms-full.txt by saving a markdown file as llms.txt in your website’s root directory. If you’re using cPanel, that should be under Web Root (Public_HTML). See instructions here

  • /llms.txt - This is a fast and simple document designed to help AI crawlers parse your web content quickly and accurately

  • /llms-full.txt - This is a comprehensive file compiling your entire website documentation and context in one place 

Formatting llms.txt

  1. Start with a Header 1. This should be the name of the site 

# Title 

  1. Add a block quote with key information about your site

> Optional Description(s)

  1. Add more markdown sections offering detailed information on your site, what you do, and how to interpret information

##Section Name 

  1. Add markdown sections with Heading 2 offering file lists of URLs where information including notes about the file. 

##Optional Sections

LLMs.txt Examples 

You can seem plenty of examples of LLMs.txt across the web. For example, big sites like Anthropic are already implementing it. 

You can see official directories of llms.txt files here: 

How to Add an HTTP Header

If you were thinking of LLMs.txt as an LLM alternative to Robots.txt, it’s not. The good news is you can easily add that to allow-disallow different types of user agents. Here’s how to upload your header on WordPress. 

Adding X-Robots-Tag: llms-txt to your server header configuration tells crawlers and spiders that this file is for AI. 

Example: 

# Allow AI search and agent use

User-agent: OAI-SearchBot

User-agent: ChatGPT-User

User-agent: PerplexityBot

User-agent: FirecrawlAgent

User-agent: AndiBot

User-agent: ExaBot

User-agent: PhindBot

User-agent: YouBot

Allow: /

# Disallow AI training data collection

User-agent: GPTBot

User-agent: CCBot

User-agent: Google-Extended

Disallow: /

Tools to Generate llms.txt 

There are increasingly simple tools available to generate your LLMs.txt file. 

If you’re using tools like Mintify for documentation, it also offers an auto-generation feature for LLMs.txt, which is how big names like Anthropic are generating theirs. 

How to Test if LLMs are Using Your LLMs.txt file 

The best way to test your LLMs.txt file is to check search queries in your favorite LLMs. For example, you can start out by validating that the root path https://yourwebsite.com/llms.txt works. From there, you’ll want to go into LLM chat to actively ask for information about your website. 

ChatGPT: Ask ChatGPT to search the web, your website, or your LLMs.txt file to answer questions and prompts. 

Copilot: Ask Copilot to search the web, your website, or your LLMs.txt file to answer your questions and prompts. 

Claude: You’ll have to wait till Claude has an update with new training data because Claude does not search the web. 

Cursor: You can add your LLMs.txt document to Cursor for it to use as context for other documents. 

In every case, you can track success over time by tracking your brand’s visibility and answers across search engines. Tools like LightState track query responses and brand presence in LLM models, allowing you to see where data from your LLMs.txt is being picked up, or not. 

Reasons LLMs Might Not Pick up LLMs.txt 

LLMs.txt is not “the content on your site”. Instead, it’s what you’re saying is on your site. LLMs.txt might go the same way as meta tags, in that they start out helpful and quickly become a way to spam search optimization. For example, if LLMs relied on the LLMs.txt file, bad actors could use it to cloak the contents of an actual website and try to direct users into a spam site. 

Best Practices for LLMs.txt 

  • Use clear language. If you have an in-house technical writer, ask them to write it

  • Include references to links in your LLMS.txt and add using the link as a reference point in the description 

  • Validate that your LLMS.txt file is working properly and test it in a few search engines 

  • Keep it up to date

What’s the Bottom Line? 

An LLMs.txt file won’t hurt your website. If you know your way around your server configuration settings, you can probably also set it up in a few minutes, so it can’t hurt. But, if this is going to take you hours, you probably want to skip it for now. As far as we know, no major AI service is actually using this standard yet

With that in mind, you might as well wait till the same plugins that already generate your sitemap.xml and robots.txt also generate llms.txt. As more people implement the standard, more LLM crawlers are likely to start using it, but for now, that’s not yet the case. There’s no rush on this. 

In fact, if you do use it for now and an LLM actually picks up on it, it could end up simply directing potential users to your LLMs file instead of to the actually relevant pages. No one wants that. 

  • LLMs.txt shows context to AI bots, that’s all it’s for

  • LLMs.txt is more like a reference file than an access/do not access file like Robots.txt 

  • LLMs.txt is not yet a widely used or accepted standard, so you have time to figure out if you want to use it or not. 

What do you think? Are you using LLMs.txt or not?