How to ignore robots.txt at a spider level in Scrapy

Jul 30, 2020 · by Tim Kamanin

Scrapy has the ROBOTSTXT_OBEY setting that defines whether your spiders should respect robots.txt policies or not. The problem is that this setting is global and applies to all spiders. But what if you want to override it for some spiders?

It turns out it's easy, and the following technique can be used to override any Scrapy setting (not only ROBOTSTXT_OBEY) at a spider level.

All you need to do is to add custom_settings dictionary with values you want to override to a spider class, so in our case it would look like:

class MyPoliteSpider(scrapy.Spider):
  name = 'my_polite_spider'
  custom_settings = {

I've set MyPoliteSpider not to respect robots.txt policies, which is not very polite...

Hey, if you've found this useful, please share the post to help other folks find it:

There's even more:

Subscribe for updates

  • via Twitter: @timonweb
  • old school RSS:
  • or evergreen email ↓ ↓ ↓