By Todd Rowan


2019-03-14 21:28:14 8 Comments

In May of last year we retired a domain and redirected it at a new one. This was done to consolidate two sites from a company merger into one site.

We shutdown the CMS for site1.com and pointed its DNS at site2.com. We built a redirect engine with over a thousand rules to redirect any request for site1.com to the appropriate page on site2.com. So it is not possible to get a 200 result from site1.com. You always get a 301 which turns into a 200 or a 404 at site2.com.

We are now ten months from site retirement and I still get 40K+ daily requests from bots on site1.com (80% from Bing, but everyone else is in there, too). There are no links anywhere on site2.com that reference site1.com. All sitemaps reference site2.com.

If you search on our primary keywords that were on site1.com before the migration, we still rank on the first page with site2.com urls. So SEO there is not a problem.

I have other site consolidation projects on the way and I do not want to have to spin up additional resources just to handle redirects for bots.

We do 301 redirect site1.com/robots.txt to site2.com/robots.txt. Should I configure my server to serve up a global Disallow on site1.com/robots.txt? That shouldn't affect site2 crawling, nor should it affect SEO, correct?

In short, how can I get the bots to stop crawling site1?

1 comments

@Chris Rutherfurd 2019-03-15 08:19:34

You are never going to be able to 100% stop bots from trying to access pages from the old domain. You have done the right thing by 301 redirecting the old pages to the new relevant pages as this will generally pass on the old ranking in calculating the ranking of the new site. The hard part here is that being such a large site there are almost definitely going to be external third party links coming back to the old site and when a crawler hits one of those links the link will be added to the index again for re-crawling. There is no way to stop it from ever again happening as domains and pages that have gone do quite often re-activate again in the future whether by the original site owner or by a new site owner on a new subject matter. As long as you maintain the old domain name and 301 redirect requests for pages which still are valid but have been moved then you are doing all you really can do. Over time organically old links will disappear as old pages linking to your pages get deleted or archived or webmasters update the links to the new correct content. Additionally you mention that there are some pages which you return a 404 error for. If you don't plan on restoring those pages you would be better of returning a 410 Gone error code as it tells crawlers that the page is permenantely gone and you don't have any intention of bringing it back and often times it will result in this page being removed from the index and from Google in general and when the crawlers come across the link again in the future when they check it and try to crawl it and get a 410 error code then the page won't be added to the index at all.

@Todd Rowan 2019-03-15 15:07:52

Thanks for this. I guess I'll just ride it out. RE: 404s, I didn't mean to say that we had pages that were removed, just that requests to the old domain that would have 404'd there now 404 on the new domain.

Related Questions

Sponsored Content

1 Answered Questions

[SOLVED] Why am I getting bot hits from compute-1.amazonaws.com?

1 Answered Questions

[SOLVED] How block only Yandex bot

0 Answered Questions

1 Answered Questions

[SOLVED] AOL search engine bot name?

1 Answered Questions

[SOLVED] How To Slow Down A Generic Bot?

  • 2011-09-28 15:19:30
  • Itai
  • 879 View
  • 3 Score
  • 1 Answer
  • Tags:   web-crawlers

2 Answered Questions

[SOLVED] Does the adsense bot ever get bored?

Sponsored Content