What is LLMs.txt? The Ultimate Guide
Sarah Chen
Product Development
AI models like ChatGPT, Claude, and Copilot are increasingly relying on website text for information. That’s true for both real-time searches (Search the web for) and bulk database training. LLMs scrape websites for content and then often share that in full in AI overviews, prompt responses, and research.
Unfortunately, LLMs aren’t actually very good at navigating most websites. Most importantly, they lack the context to navigate pages, ads, and external links, meaning generated content can be imprecise and even wrong.
LLMs.txt is a proposed standard to help large language models (AI) navigate websites. This file is meant to add to existing standards like Robots.txt and Sitemap.xml. Unlike either of these files, LLMs.txt is meant to be written in LLM-friendly Markdown. In addition, it’s meant to share context as well as navigation data. That creates a unique opportunity for you to control the context in which LLMs scrape your website and use its content.
That’s the short answer! But keep reading to learn about how to implement LLMs.txt into your own website and, more importantly, whether you should bother at all.
What is llms.txt
LLMs.txt is a markdown file used to help LLMs like Gemini, Claude, ChatGPT, and Copilot to navigate your website and its content. The idea is to add context to search and site structure, so LLMs can navigate your website on demand to answer user prompts.
It’s also similar to robots.txt and sitemap.xml but performs a different function.
Create a structured overview of website content, so AI crawlers can navigate your site and parse it to answer queries
Create a contextual map so AI crawlers can respond to search requests quickly
Define the contexts you’d like AI models to use when answering questions about your content and brand
Allows AI crawlers to overcome context limitations and better access your site
The LLMs.txt proposal is about showing context to LLMs using a markdown format, so they can consume that content. That means no ads, no site navigation, just content with context. That circumvents, for example, the nightmare in which a prospective customer asks ChatGPT what you’re offering and it explains ad content instead.
LLMs.txt vs Robots.txt vs Sitemap.xml
LLMs.txt | Robots.txt | Sitemap.xml |
Human Readable | Includes pages that are not human readable | Human Readable |
Navigable by LLMs | Navigable by LLMs | Not always navigable by LLMs |
May include external links | Doesn’t include external links | Doesn’t include external links |
Only includes information relevant for context | Includes full documents | Includes full documents |
Helps search engine crawlers understand content | Controls search engine crawler access | Lists all indexable pages with no context |
Why use LLMs.txt
For LLMs:
LLMs.txt files give LLMs a context-driven way to parse and navigate your website. That won’t matter as much for LLMs like Claude, which can navigate XML files and therefore your sitemap. However, by adding context as well as navigation, you control how content is processed, mapped, and used.
For Users:
You can think of your LLMs.txt file as a context-driven guide to your website, documentation, and products. Rather than simply functioning for LLMs, they can guide developers through software documentation, create opportunities for defining business structure, and context. Of course, it’s still a wall of text, so it might be challenging for human readers to navigate as well.
Should You Use LLMs.txt?
For now, it’s not actually that big of a deal if you use LLMs.txt or not. Why? AI services aren’t actually using them yet. In fact, if you look at the database of websites adding LLMs.txt to their site, the numbers aren’t even in the thousands yet. That means it doesn’t even make sense for AI services to start looking for those files yet.
In one Reddit conversation here, the OP asked “I've submitted to my blog's root an LLM.txt file earlier this month, but I can't see any impact yet on my crawl logs. Just curious to know if anyone had a tracking system in place, or just if you picked up on anything going on following the implementation.
If you haven't implemented it yet, I am curious to hear your thoughts on that.”
John Mueller, Google’s search advocate, pitched into the conversation as well:
“AFAIK none of the AI services have said they’re using LLMs.TXT (and you can tell when you look at your server logs that they don’t even check for it). To me, it’s comparable to the keywords meta tag – this is what a site-owner claims their site is about … (Is the site really like that? well, you can check it. At that point, why not just check the site directly?)”
In addition, if an LLM page decides to point to your LLMs.txt file as a source, that means the user that clicks through is going to have a harder time navigating your website. Of course, most LLM users aren’t asking questions to go on to click through to the website, so this may not be a concern at all.
How to install llms.txt on Your Website
Markdown is the most widely and easily understood language for LLMs. By providing a Markdown file, you give AI traffic an easy way to structure your site and find the information it needs.
You’ll want to create a root path of /llms.txt or a subpath /llms-full.txt by saving a markdown file as llms.txt in your website’s root directory. If you’re using cPanel, that should be under Web Root (Public_HTML). See instructions here.
/llms.txt - This is a fast and simple document designed to help AI crawlers parse your web content quickly and accurately
/llms-full.txt - This is a comprehensive file compiling your entire website documentation and context in one place
Formatting llms.txt
Start with a Header 1. This should be the name of the site
# Title
Add a block quote with key information about your site
> Optional Description(s)
Add more markdown sections offering detailed information on your site, what you do, and how to interpret information
##Section Name
[Link title] (https://link_url): Information about link
Add markdown sections with Heading 2 offering file lists of URLs where information including notes about the file.
##Optional Sections
[Link title] (https://link_url):
LLMs.txt Examples
You can seem plenty of examples of LLMs.txt across the web. For example, big sites like Anthropic are already implementing it.
You can see official directories of llms.txt files here:
How to Add an HTTP Header
If you were thinking of LLMs.txt as an LLM alternative to Robots.txt, it’s not. The good news is you can easily add that to allow-disallow different types of user agents. Here’s how to upload your header on WordPress.
Adding X-Robots-Tag: llms-txt to your server header configuration tells crawlers and spiders that this file is for AI.
Example:
# Allow AI search and agent use
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: PerplexityBot
User-agent: FirecrawlAgent
User-agent: AndiBot
User-agent: ExaBot
User-agent: PhindBot
User-agent: YouBot
Allow: /
# Disallow AI training data collection
User-agent: GPTBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /
Tools to Generate llms.txt
There are increasingly simple tools available to generate your LLMs.txt file.
If you’re using tools like Mintify for documentation, it also offers an auto-generation feature for LLMs.txt, which is how big names like Anthropic are generating theirs.
How to Test if LLMs are Using Your LLMs.txt file
The best way to test your LLMs.txt file is to check search queries in your favorite LLMs. For example, you can start out by validating that the root path https://yourwebsite.com/llms.txt works. From there, you’ll want to go into LLM chat to actively ask for information about your website.
ChatGPT: Ask ChatGPT to search the web, your website, or your LLMs.txt file to answer questions and prompts.
Copilot: Ask Copilot to search the web, your website, or your LLMs.txt file to answer your questions and prompts.
Claude: You’ll have to wait till Claude has an update with new training data because Claude does not search the web.
Cursor: You can add your LLMs.txt document to Cursor for it to use as context for other documents.
In every case, you can track success over time by tracking your brand’s visibility and answers across search engines. Tools like LightState track query responses and brand presence in LLM models, allowing you to see where data from your LLMs.txt is being picked up, or not.
Reasons LLMs Might Not Pick up LLMs.txt
LLMs.txt is not “the content on your site”. Instead, it’s what you’re saying is on your site. LLMs.txt might go the same way as meta tags, in that they start out helpful and quickly become a way to spam search optimization. For example, if LLMs relied on the LLMs.txt file, bad actors could use it to cloak the contents of an actual website and try to direct users into a spam site.
Best Practices for LLMs.txt
Use clear language. If you have an in-house technical writer, ask them to write it
Include references to links in your LLMS.txt and add using the link as a reference point in the description
Validate that your LLMS.txt file is working properly and test it in a few search engines
Keep it up to date
What’s the Bottom Line?
An LLMs.txt file won’t hurt your website. If you know your way around your server configuration settings, you can probably also set it up in a few minutes, so it can’t hurt. But, if this is going to take you hours, you probably want to skip it for now. As far as we know, no major AI service is actually using this standard yet.
With that in mind, you might as well wait till the same plugins that already generate your sitemap.xml and robots.txt also generate llms.txt. As more people implement the standard, more LLM crawlers are likely to start using it, but for now, that’s not yet the case. There’s no rush on this.
In fact, if you do use it for now and an LLM actually picks up on it, it could end up simply directing potential users to your LLMs file instead of to the actually relevant pages. No one wants that.
LLMs.txt shows context to AI bots, that’s all it’s for
LLMs.txt is more like a reference file than an access/do not access file like Robots.txt
LLMs.txt is not yet a widely used or accepted standard, so you have time to figure out if you want to use it or not.
What do you think? Are you using LLMs.txt or not?