Back
Tech

Optimizing websites for the future of search? LLMs.txt explained

Loura Kruger-Zwart
  • 6 days ago
  • 3 min read

The way we use the internet is changing: Large Language Models like ChatGPT or Claude can direct human users to websites without a search engine in the middle. But how do LLMs know what site information is actually relevant to the prompt? LLMs.txt seems to be the answer—a way to specify exactly what information LLMs and crawlers should gather and ultimately share if someone asks. Read on to find out more.

What is LLMs.txt?

There are all kinds of behind-the-scenes things going on online, including ‘secret messages’ for different kinds of website visitors. If you’ve had a look around on our blog before, you’ll know all about the humans.txt file message for crediting the humans behind a website’s creation, ads.txt for advertising protocols, and security.txt as the first stop for cybersecurity-related queries. But since the launch of a certain AI helper in November 2022, there’s been a new digital visitor on the online block: Large Language Models (LLMs) are collecting all kinds of information from the internet—about websites, companies, products and services, tips, tricks, and everything in between. But are these LLM-collected pages of information the right details from each website? How do they know which webpages have the most relevant information? 

It makes sense that website creators should be able to communicate with LLMs to make sure they’re retrieving high-quality answers, should future visitors first come in contact with their products, services, or advice via ChatGPT or another GenAI colleague. Introducing the newest website ‘secret message’: LLMs.txt.

An example structure for a website’s LLMs.txt file, via Martin’s Dev Diary at MartinBowling.com.

What does LLMs.txt do?

As LLMs and crawlers traverse the web and collect information, they’re looking at the first pages of your site — they don’t know if your ‘about us’ page is more important than that obscure blog you wrote five years ago, and it certainly isn’t going to check every single page on your site either. Just as robots.txt indicates how crawlers can index by setting rules and directing to sitemaps, the point of LLMs.txt is to help an LLM find the best information about your website in the most direct way. This could be a list of prioritized pages with relevant details about your site, links to documentation, or a collection of quick points or resources. See a comprehensive example here, on an AI observability platform’s website. 

How to use LLMs.txt

As with other kinds of .txt files, this one can also be added to your website root as a markdown file. According to llmstxt.org, it should include your website’s title and description, and perhaps some important details about your business’s products or services, if applicable. Next come the links to documentation, core information, or anything relevant you want to be sure an LLM could answer or share if asked. Keeping structured information on hand and easy to access in LLMs.txt files benefits LLMs. This allows them to easily access specific details, like software library features, corporate structures, legal information, personal CV details, product categories, or educational course offerings. Jeremy Howard of llmstxt.org writes:

Language models can ingest a lot of information quickly, so it can be helpful to have a single place where all of the key information can be collated—not for training [...], but for helping users accessing the site via AI helpers.

Should all websites have LLMs.txt?

Creating a LLMs.txt on your website is not difficult and could bring many benefits, so is there any reason not to have one? On the one hand, this protocol is still very new and more or less still a proposal. Most LLMs likely aren’t scanning for it yet, which means the text might not even be seen. Not even Google has a page for LLMs.txt—but that doesn’t mean LLMs aren’t being considered. For example, Google currently sets rules for tools like ChatGPT, AnthropicAI, PerplexityBot, and Claude in its robots.txt file, as does the Guardian here. Most websites might start this way, including instructions for LLMs on a different page, rather than creating a whole new file. On the other hand, why not get ahead of the curve and do both? 

Keen to see which websites have already embraced LLMs online? Take a look at the growing list of examples here in the LLMs.txt directory.

A directory of almost 100 websites with LLMs.txt files, via LLMstxt.site.

Subscribe to our newsletter to stay in the loop about the latest insights and developments around web data.

Subscribe

Related Recipes