By donquixote


2019-01-08 19:33:15 8 Comments

The default robots.txt shipped with Drupal contains rules such as this:

Disallow: /user/register/

If I understand correctly, the leading slash means this really has to match the beginning of the path. E.g. a path like /xyz/user/register/ would not match the rule. (Please correct me if I'm wrong)

(For the ending slash I am not sure)

With the language path prefix option, Drupal will have urls like /fr/user/register etc.

It can be assumed that if we want to exclude /user/register, we also want to exclude /fr/user/register and other language versions.

However, a /*/user/register/ might be too generic, as it also would match other urls which we don't want to exclude.

Questions:

  • Will Drupal's default robots.txt also exclude language-specific versions of the targeted urls? (according to my own assumption it does not).
  • What would be the simplest / cleanest way to exclude all language versions for a given path?

I think this applies equally to Drupal 7 and Drupal 8.

1 comments

@leymannx 2019-01-08 20:09:13

Whatever you are unhappy with, with Drupal's default robots.txt, delete it, and install RobotsTxt to provide an UI to let your SEO person edit the robots.txt according your needs.

Use this module when you are running multiple Drupal sites from a single code base (multisite) and you need a different robots.txt file for each one. This module generates the robots.txt file dynamically and gives you the chance to edit it, on a per-site basis, from the web UI.


Look what I found. This duplicate issue on drupal.org robots.txt: add wildcarded paths for multilingual sites leads to this open issue Fix path matching in robots.txt, which actually could need your input.

But basically some commenters seem to recommend exactly what you came up with, wildcarding the path prefix.

@hotwebmatter 2019-01-08 22:13:50

In his comment above, @Clive seems to agree that it is a Drupal-specific question. Good module recommendation, though!

@Clive 2019-01-09 08:00:39

I removed the meta commentary about the question, that stuff isn't necessary in answers. We have comments for clarification and meta if it's needed as alternatives. But this is indeed on topic here, how (if at all) Drupal handles a multi-lingual robots.txt file is very much Drupal-specific

@leymannx 2019-01-09 08:15:36

@Clive – I humbly disagree. robots.txt is customizable. Multilanguage is optional. You talk with your SEO person about that. This question belongs on Webmasters. Will Meta that.

@donquixote 2019-01-09 10:16:18

We don't have a dedicated SEO person :) I would agree this question can be phrased in a Drupal-agnostic way, if other websites have a similar language selection mode based on path prefix. But I think it does have Q/A value to have a Drupal-specific entry point, specifically to state whether or not the shipped robots.txt has this covered or not, and what to do next.

@donquixote 2019-01-09 11:30:35

Also I doubt that a user-generated robots.txt is the answer, if that user needs to replicate every rule across languages. This sounds like we would need an automated solution.

@donquixote 2019-01-09 11:32:57

The actual question of "What would be the ideal robots.txt for this situation" is independent of Drupal. But the question of how to produce such a robots.txt e.g. dynamically based on language selection settings, this can again have very Drupal-specific answers.

@leymannx 2019-01-09 11:35:08

@donquixote – Absolutely agreeing. But this currently isn't the question. And even if it would be the question, it then is too broad, unless OP doesn't start building such a feature and get stuck at a certain point. But don't let us discuss this here, let's head over to Meta: drupal.meta.stackexchange.com/q/3747/15055

@Clive 2019-01-09 12:20:39

@leymannx The two questions are Will Drupal's default robots.txt also exclude language-specific versions of the targeted urls? and What would be the simplest / cleanest way to exclude all language versions for a given path? The answer to the first is "no", which is specific to Drupal. The answer to the 2nd you've covered for "simplest", and I've half-covered for "cleanest" in the comment above. What about this is not related to Drupal? I really can't understand the objection. Can you be really specific about which language in which question makes you think this isn't about Drupal?

@Clive 2019-01-09 12:40:09

I've read that last comment back and it feels...confrontational. Not my intention, sorry if it came across like that.

@leymannx 2019-01-09 13:15:03

@Clive – No prob, I only see a fact-based discussion going on. But let's continue it over on Meta: drupal.meta.stackexchange.com/q/3747/15055 :)

Related Questions

Sponsored Content

2 Answered Questions

[SOLVED] Multilingual site, always use language path prefix (no bare URLs)

  • 2017-11-08 14:03:26
  • Patrick Kenny
  • 1067 View
  • 3 Score
  • 2 Answer
  • Tags:   8 i18n-l10n

1 Answered Questions

Problem with i18n_string + features + sites with different default language

1 Answered Questions

[SOLVED] Facebook scraper ignores default language

1 Answered Questions

How to disable 'Path prefix language code' for admin panel?

  • 2015-03-09 10:37:24
  • dayuloli
  • 300 View
  • 2 Score
  • 1 Answer
  • Tags:   i18n-l10n

3 Answered Questions

[SOLVED] Force Language Path Prefix For Default Language

  • 2012-04-12 09:49:54
  • ovi
  • 7667 View
  • 6 Score
  • 3 Answer
  • Tags:   7 i18n-l10n

2 Answered Questions

[SOLVED] How to remain logged in when switching languages via domain?

  • 2014-01-21 14:32:29
  • Abdelouahed Siyagh
  • 149 View
  • 2 Score
  • 2 Answer
  • Tags:   i18n-l10n

2 Answered Questions

[SOLVED] how to change language

1 Answered Questions

Multi Country and Language Site

3 Answered Questions

[SOLVED] How to set language for user register page / form per subdomain

  • 2014-01-17 14:23:21
  • Whiskey
  • 506 View
  • 0 Score
  • 3 Answer
  • Tags:   7 forms i18n-l10n

1 Answered Questions

[SOLVED] Multilingual front page- redirect anonymous users based on browser language

  • 2012-07-12 15:18:47
  • Patrick Kenny
  • 1541 View
  • 3 Score
  • 1 Answer
  • Tags:   i18n-l10n

Sponsored Content