Scrapy: [Python3.5.2][Scrapy 1.1.2] ImportError: No module named 'sgmllib'

Created on 16 Sep 2016  路  3Comments  路  Source: scrapy/scrapy

The sgmllib module has been removed in Python 3. What can i do to use the SgmlLinkExtractor?

/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py:4: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead
  from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py:4: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors.sgml` is deprecated, use `scrapy.linkextractors.sgml` instead
  from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.5/site-packages/scrapy/cmdline.py", line 141, in execute
    cmd.crawler_process = CrawlerProcess(settings)
  File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 238, in __init__
    super(CrawlerProcess, self).__init__(settings)
  File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 129, in __init__
    self.spider_loader = _get_spider_loader(settings)
  File "/usr/local/lib/python3.5/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 33, in from_settings
    return cls(settings)
  File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 20, in __init__
    self._load_all_spiders()
  File "/usr/local/lib/python3.5/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
    for module in walk_modules(name):
  File "/usr/local/lib/python3.5/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
    submod = import_module(fullpath)
  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/Users/LokiSharp/wikiSpider/wikiSpider/spiders/articleSpider.py", line 4, in <module>
    from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
  File "/usr/local/lib/python3.5/site-packages/scrapy/contrib/linkextractors/sgml.py", line 7, in <module>
    from scrapy.linkextractors.sgml import *
  File "/usr/local/lib/python3.5/site-packages/scrapy/linkextractors/sgml.py", line 6, in <module>
    from sgmllib import SGMLParser
ImportError: No module named 'sgmllib'

Most helpful comment

SgmlLinkExtractor has been deprecated in Scrapy 1.0
We recommend using LinkExtractor now.

from scrapy.linkextractors import LinkExtractor

It should be faster (it's lxml-backed) than SgmlLinkExtractor and Python 3 compatible.

All 3 comments

SgmlLinkExtractor has been deprecated in Scrapy 1.0
We recommend using LinkExtractor now.

from scrapy.linkextractors import LinkExtractor

It should be faster (it's lxml-backed) than SgmlLinkExtractor and Python 3 compatible.

Yep, SgmlLinkExtractor is deprecated in Python 2, and we don't support it in Python 3. Sorry if it causes issues for you! But as Paul said, LinkExtractor is faster, and supporting SgmlLinkExtractor in Python 3 is hard because sgmllib is removed from Python 3 standard library.

from scrapy.linkextractors import LinkExtractor, this is compatible for python 3

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mohmad-null picture mohmad-null  路  4Comments

redapple picture redapple  路  3Comments

mkaya93 picture mkaya93  路  4Comments

JafferWilson picture JafferWilson  路  4Comments

tonal picture tonal  路  3Comments