Feature request.
It would be good if scrapy had an easily accessible means of reading settings on a per-spider basis, and then making them accessible to the spider. From my many attempts to do this so far, in theory all of the components for this appear to be in place. Populating settings is already done:
https://doc.scrapy.org/en/latest/topics/settings.html#populating-the-settings - but then the problem is accessing them.
Ideally in a fashion that's compatible with scrapyd (so no calling process.crawl(spider, my_settings)).
Ideally: A project could have a generic project wide settings.py file with both the standard settings and any custom ones added by the developer. Then, using a command-line argument to indicate the settings file to use, the __init__ method of the spider overrides specific settings, (much as custom_settings does), and these settings are then accessible throughout the spider via self.settings in the usual way.
Current Problems
custom_settings
Unfortunately custom_settings doesn't seem to be usable for this because it cannot be declared in __init__, but needs to be declared earlier.
settings.py
Currently, even if a user is willing to just use a different settings.py file entirely for each spider (thereby duplicating most of it), that's not readily possible either.
os.environ['SCRAPY_SETTINGS_MODULE'] = 'myproject.settings'
these_settings = get_project_settings()
The above only gets the settings into the variable these_settings, they're not used by the spider or accessible via self.settings.
Desire for feature
Based on StackOverFlow, this is something a lot of people want. The fact there are so many answers that are all so different shows there isn't a particular good way of doing it.
http://stackoverflow.com/questions/9814827/creating-a-generic-scrapy-spider
http://stackoverflow.com/questions/12996910/how-to-setup-and-launch-a-scrapy-spider-programmatically-urls-and-settings
http://stackoverflow.com/questions/35662146/dynamic-spider-generation-with-scrapy-subclass-init-error
http://stackoverflow.com/questions/40510526/how-to-load-different-settings-for-different-scrapy-spiders
http://stackoverflow.com/questions/2396529/using-one-scrapy-spider-for-several-websites
Being able to readily get allowed_domains and start_urls within it would also be good.
To read settings in a spider one can use Spider.settings attribute. It doesn't work in __init__ method, but it works e.g. in start_requests.
If the goal is to change settings, it becomes more complicated. Generally, one can change settings only before other components are configured, so initialization order is important.
There is an undocumented Spider.update_settings method which receives project-wide settings and updates them; maybe we should document it and make it public. But I'm not sure it should be a final solution.
See also discussion at https://github.com/scrapy/scrapy/issues/1305 - there is a proposal to allow changing settings in Spider.__init__, it is backwards incompatible though.
There is also a PR for 'addons' - components which can change settings (https://github.com/scrapy/scrapy/pull/1272).
Spider.update_settings doesn't work as expected. I prefer to change the settings via command line when starting the spider.
https://doc.scrapy.org/en/latest/topics/settings.html#command-line-options
For example: scrapy crawl myspider -s LOG_FILE=scrapy.log
Hi, just faced with same problem.
My scenario is:
I run spiders from a script with the
setting = set_settings(get_project_settings(), config)
process = CrawlerProcess(setting)
process.crawl(spider_name).start()
where set_settings() changes project settings depending on passed config. This works fine, until I have no spider where I need to define custom_settings and append setting defined in set_settings() (DOWNLOADER_MIDDLEWARES for example).
The get_project_settings() in/before spider __init__() obviously doesn't work because it gets settings from settings.py which is not relevant already.
I will be happy to hear any ideas, thanks.
Most helpful comment
To read settings in a spider one can use Spider.settings attribute. It doesn't work in
__init__method, but it works e.g. instart_requests.If the goal is to change settings, it becomes more complicated. Generally, one can change settings only before other components are configured, so initialization order is important.
There is an undocumented Spider.update_settings method which receives project-wide settings and updates them; maybe we should document it and make it public. But I'm not sure it should be a final solution.
See also discussion at https://github.com/scrapy/scrapy/issues/1305 - there is a proposal to allow changing settings in
Spider.__init__, it is backwards incompatible though.There is also a PR for 'addons' - components which can change settings (https://github.com/scrapy/scrapy/pull/1272).