I noticed that in both line 1177 and line 1179 _it_ was passed as the first argument to itertools.islice().
Note that itertools.islice() expects its first argument to be an _iterable_ (see this), but _it_ is not necessarily an iterable. It is simply an iterator (see line 1172). So there is a type mismatch.
https://github.com/RaRe-Technologies/gensim/blob/8149035e22c3df932a22fc654ae35942d5e2f866/gensim/utils.py#L1145-L1183
I guess an easy fix to this bug would be to directly pass the provided argument, _iterable_. I didn't see any need to create an iterator for it.
I met this bug when I was using LdaModel. In the initializer of that class it tries to break the corpus, which is an iterable, into chunks using chunkize_serial. I attempted to implement my own corpus to stream documents from the disk. Then I met with a TypeError claiming that the corresponding iterator I implemented was not iterable.
Thanks for taking a look at this!
Can you show the actual error you received, and possibly some minimal code to trigger that error?
Sure sure sorry to get back to you a bit late! This is the code that can trigger the error.
import gensim.utils as utils
class SimpleIterator:
def __init__(self):
self.pos = -1
self.array = [i for i in range(10)]
def __next__(self):
self.pos += 1
if self.pos == 10:
raise StopIteration
else:
return self.array[self.pos]
class SimpleIterable:
def __iter__(self):
return SimpleIterator()
iterable = SimpleIterable()
print(list(utils.chunkize_serial(iterable = iterable, chunksize = 2)))
And the error is as follows:
TypeError Traceback (most recent call last)
~/Documents/programs/ngramGen.py in
22 iterable = SimpleIterable()
----> 23 print(list(utils.chunkize_serial(iterable = iterable, chunksize = 2)))
~/Library/Python/3.7/lib/python/site-packages/gensim/utils.py in chunkize_serial(iterable, chunksize, as_numpy, dtype)
1177 wrapped_chunk = [[np.array(doc, dtype=dtype) for doc in itertools.islice(it, int(chunksize))]]
1178 else:
-> 1179 wrapped_chunk = [list(itertools.islice(it, int(chunksize)))]
1180 if not wrapped_chunk[0]:
1181 break
TypeError: 'SimpleIterator' object is not iterable
Sorry but any feedback upon this...?
Thanks for the clear example code!
Technically, all iterators are also supposed to be iterable. For example, see: https://docs.python.org/3/glossary.html#term-iterator – which explains (emphasis added):
iterator
An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again. _Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted._ One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
I suspect if your iterator were to implement __iter__(), the error would go away.
Yeah I get it! I will have a try -- adding the __iter__() to see if it works!
Closing this issue for now. But if you have fresh results showing a problem actually exists, feel free to update here with those for more consideration.
Most helpful comment
Thanks for the clear example code!
Technically, all iterators are also supposed to be iterable. For example, see: https://docs.python.org/3/glossary.html#term-iterator – which explains (emphasis added):
I suspect if your iterator were to implement
__iter__(), the error would go away.