The sgmllib module has been removed in Python 3. What can i do to use the SgmlLinkExtractor?
/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py:4: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py:4: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors.sgml` is deprecated, use `scrapy.linkextractors.sgml` instead
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.5/site-packages/scrapy/cmdline.py", line 141, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 238, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 129, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 33, in from_settings
return cls(settings)
File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 20, in __init__
self._load_all_spiders()
File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python3.5/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py", line 4, in <module>
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
File "/usr/local/lib/python3.5/site-packages/scrapy/contrib/linkextractors/sgml.py", line 7, in <module>
from scrapy.linkextractors.sgml import *
File "/usr/local/lib/python3.5/site-packages/scrapy/linkextractors/sgml.py", line 6, in <module>
from sgmllib import SGMLParser
ImportError: No module named 'sgmllib'
SgmlLinkExtractor has been deprecated in Scrapy 1.0
We recommend using LinkExtractor now.
from scrapy.linkextractors import LinkExtractor
It should be faster (it's lxml-backed) than SgmlLinkExtractor and Python 3 compatible.
Yep, SgmlLinkExtractor is deprecated in Python 2, and we don't support it in Python 3. Sorry if it causes issues for you! But as Paul said, LinkExtractor is faster, and supporting SgmlLinkExtractor in Python 3 is hard because sgmllib is removed from Python 3 standard library.
from scrapy.linkextractors import LinkExtractor, this is compatible for python 3
Most helpful comment
SgmlLinkExtractorhas been deprecated in Scrapy 1.0We recommend using
LinkExtractornow.It should be faster (it's lxml-backed) than
SgmlLinkExtractorand Python 3 compatible.