Gensim: Recent versions of smart-open (1.11.0 and 1.11.1) break gensim in Python 2.7

Created on 9 Apr 2020  Β·  19Comments  Β·  Source: RaRe-Technologies/gensim

Problem description

What are you trying to achieve? What is the expected result? What are you seeing instead?

Using gensim with Python2.7

Steps/code/corpus to reproduce

pip install gensim
...
python
>>> import gensim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/__init__.py", line 5, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils  # noqa:F401
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/parsing/__init__.py", line 4, in <module>
    from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2,  # noqa:F401
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/parsing/preprocessing.py", line 42, in <module>
    from gensim import utils
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/gensim/utils.py", line 45, in <module>
    from smart_open import open
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/smart_open/__init__.py", line 28, in <module>
    from .smart_open_lib import open, parse_uri, smart_open, register_compressor
  File "/home/brukau/tmp/gensim/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 23, in <module>
    import pathlib
ImportError: No module named pathlib
>>> 

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

Linux-5.3.0-45-generic-x86_64-with-Ubuntu-19.10-eoan
('Python', '2.7.17 (default, Nov 7 2019, 10:07:09) \n[GCC 9.2.1 20191008]')
('NumPy', '1.16.1')
('SciPy', '1.2.3')
gensim import fails

bug impact HIGH reach HIGH

Most helpful comment

I've released 3.8.2. It contains the necessary pin:

(gensim-py2) sergeyich:~ misha$ pip show gensim
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: gensim
Version: 3.8.2
Summary: Python framework for fast Vector Space Modelling
Home-page: http://radimrehurek.com/gensim
Author: Radim Rehurek
Author-email: [email protected]
License: LGPLv2.1
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: smart-open, scipy, numpy, six
Required-by:
(gensim-py2) sergeyich:~ misha$ pip show smart-open
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: smart-open
Version: 1.10.0
Summary: Utils for streaming large files (S3, HDFS, GCS, gzip, bz2...)
Home-page: https://github.com/piskvorky/smart_open
Author: Radim Rehurek
Author-email: [email protected]
License: MIT
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: bz2file, requests, google-cloud-storage, boto3
Required-by: gensim
(gensim-py2) sergeyich:~ misha$ python -c 'import gensim'
(gensim-py2) sergeyich:~ misha$

Big thank you to @menshikh-iv for getting the CI builds to complete.

All 19 comments

@mpenkov we'll have to pin an older version of smart_open for Gensim. Gensim supports 2.7, and 2.7 is still widely used in the industry, but the latest smart_open releases dropped 2.7 support.

@brukau @bavaria95 why do you use Python2.7?

@mpenkov I also wonder how come CI didn't catch this? A critical blocking issue (for 2.7 users).

@piskvorky we are moving to Py3 only, but it seems not fast enough

@brukau what is not fast enough?

@piskvorky our migration to py3 :)

Alright :)

I see two options:

  1. We drop 2.7 from Gensim, and then this becomes a non-issue. Of course that won't help you right now, until you finish your migration. We plan to drop 2.7 from Gensim anyway in the next release.
  2. We make a quick release that pins an older version of smart_open, one that still supported py2.7.

As a quick fix for yourself now, please install an older version of smart_open manually. The README says smart_open==1.10.0 works with py2.7.

For us it is not big deal, it can be workarounded easily

I am fine with both options, I am not sure you will not have more people complaining soon, but these days it can be fair to drop Py2 support

also on PyPi Gensim still mention 2.7 Compatibility

Yes, dropping 2.7 from smart_open caused some unforeseen issues. In Gensim and perhaps other projects too.

Hopefully nothing critical as py2.7 is becoming outdated by the day.

Shouldn't smart_open 1.11.0 and above declare something that generates an error at install time, at least, if installed to a Python version it doesn't support? It seems like python_requires might do the trick: https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires

(This might not have saved gensim users, but it'd have created a more interpretable error.)

Further, dropping support for Python 2.7 seems large enough that it deserves an increment of the first, "major" release number, rather than just a "1.10.x" to "1.11.0" bump. I'd suggest smart_open just skip to major-version 3.0.0 - so the messaging is simple, "smart_open 3.0 needs Python 3".

I agree with both. The 2.7 drop was way too subtle. CC @mpenkov .

I've released 3.8.2. It contains the necessary pin:

(gensim-py2) sergeyich:~ misha$ pip show gensim
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: gensim
Version: 3.8.2
Summary: Python framework for fast Vector Space Modelling
Home-page: http://radimrehurek.com/gensim
Author: Radim Rehurek
Author-email: [email protected]
License: LGPLv2.1
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: smart-open, scipy, numpy, six
Required-by:
(gensim-py2) sergeyich:~ misha$ pip show smart-open
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Name: smart-open
Version: 1.10.0
Summary: Utils for streaming large files (S3, HDFS, GCS, gzip, bz2...)
Home-page: https://github.com/piskvorky/smart_open
Author: Radim Rehurek
Author-email: [email protected]
License: MIT
Location: /Users/misha/gensim-py2/lib/python2.7/site-packages
Requires: bz2file, requests, google-cloud-storage, boto3
Required-by: gensim
(gensim-py2) sergeyich:~ misha$ python -c 'import gensim'
(gensim-py2) sergeyich:~ misha$

Big thank you to @menshikh-iv for getting the CI builds to complete.

I'm getting this error for 3.8.2, on a fresh empty py2.7 environment:

Screen Shot 2020-04-12 at 10 20 44

But pip install gensim==3.8.1 works fine:

Screen Shot 2020-04-12 at 10 24 05

That's odd. Which platform @piskvorky ?

I cannot reproduce in clean environments on MacOS and Ubuntu.

Also, are you sure your network isn't being possessed? The SSL errors and the error message

Could not find index page for numpy

seem particularly suspicious.

Hm, interesting, it works for me on ubuntu 19.10 too

(testp27) ivan@P50:~$ python --version
Python 2.7.17
(testp27) ivan@P50:~$ pip --version
pip 20.0.2 from /home/ivan/.virtualenvs/testp27/local/lib/python2.7/site-packages/pip (python 2.7)
(testp27) ivan@P50:~$ pip install gensim
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting gensim
  Downloading gensim-3.8.2.tar.gz (23.4 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 23.4 MB 5.0 MB/s 
Collecting numpy<=1.16.1,>=1.11.3
  Using cached numpy-1.16.1-cp27-cp27mu-manylinux1_x86_64.whl (17.0 MB)
Collecting scipy>=1.0.0
  Downloading scipy-1.2.3-cp27-cp27mu-manylinux1_x86_64.whl (24.8 MB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 24.8 MB 950 bytes/s 
Collecting six>=1.5.0
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Collecting smart_open<1.11,>=1.8.1
  Using cached smart_open-1.10.0.tar.gz (99 kB)
Collecting requests
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting boto3
  Downloading boto3-1.12.39-py2.py3-none-any.whl (128 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128 kB 67.1 MB/s 
Collecting google-cloud-storage
  Downloading google_cloud_storage-1.27.0-py2.py3-none-any.whl (79 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 79 kB 14.6 MB/s 
Processing ./.cache/pip/wheels/70/27/4c/cd6a1b48a925dd8bb3640fe6948d2b7cbf88ef0858d5a84f59/bz2file-0.98-py2-none-any.whl
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Using cached urllib3-1.25.8-py2.py3-none-any.whl (125 kB)
Collecting certifi>=2017.4.17
  Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 157 kB 35.2 MB/s 
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.9-py2.py3-none-any.whl (58 kB)
Collecting jmespath<1.0.0,>=0.7.1
  Using cached jmespath-0.9.5-py2.py3-none-any.whl (24 kB)
Collecting botocore<1.16.0,>=1.15.39
  Using cached botocore-1.15.39-py2.py3-none-any.whl (6.1 MB)
Collecting s3transfer<0.4.0,>=0.3.0
  Using cached s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Collecting google-auth<2.0dev,>=1.11.0
  Downloading google_auth-1.13.1-py2.py3-none-any.whl (87 kB)
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 87 kB 12.0 MB/s 
Collecting google-resumable-media<0.6dev,>=0.5.0
  Using cached google_resumable_media-0.5.0-py2.py3-none-any.whl (38 kB)
Collecting google-cloud-core<2.0dev,>=1.2.0
  Using cached google_cloud_core-1.3.0-py2.py3-none-any.whl (26 kB)
Collecting python-dateutil<3.0.0,>=2.1
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting docutils<0.16,>=0.10
  Using cached docutils-0.15.2-py2-none-any.whl (548 kB)
Collecting futures<4.0.0,>=2.2.0; python_version == "2.7"
  Using cached futures-3.3.0-py2-none-any.whl (16 kB)
Collecting cachetools<5.0,>=2.0.0
  Using cached cachetools-3.1.1-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: setuptools>=40.3.0 in ./.virtualenvs/testp27/lib/python2.7/site-packages (from google-auth<2.0dev,>=1.11.0->google-cloud-storage->smart_open<1.11,>=1.8.1->gensim) (44.1.0)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting rsa<4.1,>=3.1.4
  Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting google-api-core<2.0.0dev,>=1.16.0
  Using cached google_api_core-1.16.0-py2.py3-none-any.whl (70 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting pytz
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Processing ./.cache/pip/wheels/56/af/44/f0c28e985bc224ffb90612f7cdeef432ba4fbd5d15485ab271/googleapis_common_protos-1.51.0-py2-none-any.whl
Collecting protobuf>=3.4.0
  Using cached protobuf-3.11.3-cp27-cp27mu-manylinux1_x86_64.whl (1.3 MB)
Building wheels for collected packages: gensim, smart-open
  Building wheel for gensim (setup.py) ... done
  Created wheel for gensim: filename=gensim-3.8.2-cp27-cp27mu-linux_x86_64.whl size=25041185 sha256=fbd6e810d34d3deeaacb2c30fd2505ca9294ee24ebf23b8d830a620e16237bf7
  Stored in directory: /home/ivan/.cache/pip/wheels/71/8e/8b/571604ba1f56fc578d240a1e7b9fd4bd00cdd8be46f83b01c5
  Building wheel for smart-open (setup.py) ... done
  Created wheel for smart-open: filename=smart_open-1.10.0-py2-none-any.whl size=90638 sha256=2407c32ef3431ccd7cfa5cb32a3971ff349aae0b8e433591d25140a6216404ce
  Stored in directory: /home/ivan/.cache/pip/wheels/ae/26/13/8172396f596bae35773e720fe117b708c7666f705c19b5ba90
Successfully built gensim smart-open
Installing collected packages: numpy, scipy, six, urllib3, certifi, chardet, idna, requests, jmespath, python-dateutil, docutils, botocore, futures, s3transfer, boto3, cachetools, pyasn1, pyasn1-modules, rsa, google-auth, google-resumable-media, pytz, protobuf, googleapis-common-protos, google-api-core, google-cloud-core, google-cloud-storage, bz2file, smart-open, gensim
Successfully installed boto3-1.12.39 botocore-1.15.39 bz2file-0.98 cachetools-3.1.1 certifi-2020.4.5.1 chardet-3.0.4 docutils-0.15.2 futures-3.3.0 gensim-3.8.2 google-api-core-1.16.0 google-auth-1.13.1 google-cloud-core-1.3.0 google-cloud-storage-1.27.0 google-resumable-media-0.5.0 googleapis-common-protos-1.51.0 idna-2.9 jmespath-0.9.5 numpy-1.16.1 protobuf-3.11.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 python-dateutil-2.8.1 pytz-2019.3 requests-2.23.0 rsa-4.0 s3transfer-0.3.3 scipy-1.2.3 six-1.14.0 smart-open-1.10.0 urllib3-1.25.8

I'm worried about this lines
image

old libssl issue (typical problem on OSX)?

This is MBP El Capitan 10.11.6, the same one I've had for years.

Also, are you sure your network isn't being possessed?

:D Possibly – I created another fresh virtual env, and the exact same command went through now:

Screen Shot 2020-04-12 at 10 45 31

So, disregard… I guess. Hopefully just some weird local network/caching issue. Packets eaten by virus.

Speaking of dependency versions & looking at these output examples:

The version constraint numpy<=1.16.1,>=1.11.3 looks fishy to me - staying on a 15-month-old version (numpy-1.16.1) of such an intensely-maintained and often-improved library as numpyseems unwise.

But, I don't see where in our source this is declared. (Maybe it's a side-effect of another dependency?)

If the aim is to pick the latest numpy supporting Python-2.7, that appears to be numpy-1.16.6 dated 2019-12-29.

@gojomo Created #2818 to deal with the numpy issue outside of this PR.

The original problem is solved (we pinned the smart_open version), so I'm closing this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dancinghui picture dancinghui  Β·  4Comments

franciscojavierarceo picture franciscojavierarceo  Β·  3Comments

k0nserv picture k0nserv  Β·  3Comments

bgokden picture bgokden  Β·  3Comments

Jianqiang picture Jianqiang  Β·  3Comments