We're evolving to serve you better! This current forum has transitioned to read-only mode. For new discussions, support, and engagement, we've moved to GitHub Discussions.

Update robots.txt: New OpenAI Web Crawler (User-agent: GPTBot)

August 9, 2023 at 4:00 am #10442

[anonymous]

Hi Publii team!

I recently learned that OpenAI allows website managers to opt-out of their web crawler, User-agent: GPTBot.

My feature suggestion would be to allow publii users to customize whether or not they would like their content to be used by Open AI. The specs can be found here: https://platform.openai.com/docs/gptbot

August 9, 2023 at 10:47 pm #10451

[anonymous]

Agreed. I came here this morning after opening Publii to see if I could manually apply the required setting to the robots.txt file. I can’t see that I can in Site Settings…

August 10, 2023 at 12:50 pm #10462

Bob

Hey,

We will add it in the upcoming Publii release.

August 19, 2023 at 1:05 pm #10507

[anonymous]

Hi,

I can confirm that in the Publii v.0.43 you will be able to block both ChatGPT-User, GPTBot and CommonCrawl bots (separately)

August 19, 2023 at 4:04 pm #10514

[anonymous]

Fantastic! Thank you Tomasz and Bob!

October 1, 2023 at 11:31 pm #10654

[anonymous]

Hi team! Looks like there’s a new control called Google Extended to manage Google’s Bard and Vertex AI, via robots.txt.

Announcement: https://blog.google/technology/ai/an-update-on-web-publisher-controls/

Spec: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended