Download
We're evolving to serve you better! This current forum has transitioned to read-only mode. For new discussions, support, and engagement, we've moved to GitHub Discussions.

Update robots.txt: New OpenAI Web Crawler (User-agent: GPTBot)

  • #10442
    Avatar photo[anonymous]

    Hi Publii team!

    I recently learned that OpenAI allows website managers to opt-out of their web crawler, User-agent: GPTBot.

    My feature suggestion would be to allow publii users to customize whether or not they would like their content to be used by Open AI. The specs can be found here: https://platform.openai.com/docs/gptbot

    #10451
    Avatar photo[anonymous]

    Agreed.  I came here this morning after opening Publii to see if I could manually apply the required setting to the robots.txt file.  I can’t see that I can in Site Settings…

    #10462
    Avatar photoBob

    Hey,

    We will add it in the upcoming Publii release.

    #10507
    Avatar photo[anonymous]

    Hi,

    I can confirm that in the Publii v.0.43 you will be able to block both ChatGPT-User, GPTBot and CommonCrawl bots (separately)

    #10514
    Avatar photo[anonymous]

    Fantastic! Thank you Tomasz and Bob!

    #10654
    Avatar photo[anonymous]

    Hi team! Looks like there’s a new control called Google Extended to manage Google’s Bard and Vertex AI, via robots.txt.

    Announcement: https://blog.google/technology/ai/an-update-on-web-publisher-controls/

    Spec: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended