While going through the tutorial, I am getting TypeError: object of type 'numpy.float64' has no len()
. I have verified the behavior on Python 2.7 and Python 3.5:
from pomegranate import *
import numpy as np
d1 = DiscreteDistribution({'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25})
d2 = DiscreteDistribution({'A': 0.10, 'C': 0.40, 'G': 0.40, 'T': 0.10})
gmm = GeneralMixtureModel( [d1, d2] )
seq = list('CGACTACTGACTACTCGCCGACGCGACTGCCGTCTATACTGCGCATACGGC')
gmm_predictions = gmm.predict( np.array(seq) )
ERROR
TypeError Traceback (most recent call last)
<ipython-input-1-8becbf5be0a3> in <module>()
7
8 seq = list('CGACTACTGACTACTCGCCGACGCGACTGCCGTCTATACTGCGCATACGGC')
----> 9 gmm_predictions = gmm.predict( np.array(seq) )
pomegranate/bayes.pyx in pomegranate.bayes.BayesModel.predict()
TypeError: object of type 'numpy.float64' has no len()
How can this be fixed?
For Sunday fun, I decided to do a git bisect
to probe when this started happening.
The following commit ran successfully and the following commit is where it fails.
Here are the script and Makefile:
bisect.bash.txt
Makefile.txt
Managed to narrow down the line where the error is coming from:
In [8]: try: gmm_predictions = gmm.predict( np.array(seq) )
...: except:
...: import traceback
...: traceback.print_exc()
...:
Traceback (most recent call last):
File "<ipython-input-8-88c80acd3363>", line 1, in <module>
try: gmm_predictions = gmm.predict( np.array(seq) )
File "pomegranate/bayes.pyx", line 425, in pomegranate.bayes.BayesModel.predict
TypeError: object of type 'numpy.float64' has no len()
Coming from here: https://github.com/jmschrei/pomegranate/blob/master/pomegranate/bayes.pyx#L425
The code says:
if not self.is_vl_:
X_ndarray = _check_input(X, self.keymap)
X_ptr = <double*> X_ndarray.data
========> n, d = len(X_ndarray), len(X_ndarray[0]) ## <===== here is the error
if d != self.d:
raise ValueError("sample only has {} dimensions but should have {} dimensions".format(d, self.d))
else:
X_ndarray = X
n, d = len(X_ndarray), self.d
Converted n, d = len(X_ndarray), len(X_ndarray[0])
into
n = len(X_ndarray)
d = len(X_ndarray[0])
After splitting the line into two parts, it is clear that the error is due to X_ndarray
being one dimensional. This is causing error len(X_ndarray[0])
Not sure what is the purpose of doing this:
Perhaps something is going on in this function:
To invoke this function we do:
keymap = [{key: i for i, key in enumerate(set(functools.reduce(lambda x, y: x+y, [d.keys() for d in gmm.distributions])))}]
print(keymap)
[{'A': 0, 'C': 1, 'T': 2, 'G': 3}]
a = utils._check_input(np.array(seq), keymap)
a.shape
Out[17]: (51,)
print(a)
[ 1. 3. 0. 1. 2. 0. 1. 2. 3. 0. 1. 2. 0. 1. 2. 1. 3. 1.
1. 3. 0. 1. 3. 1. 3. 0. 1. 2. 3. 1. 1. 3. 2. 1. 2. 0.
2. 0. 1. 2. 3. 1. 3. 1. 0. 2. 0. 1. 3. 3. 1.]
This does verify that we are getting a one dimensional array in return.
@jmschrei Can you take a look at this?
Thanks for your work in looking into this. I was taking a brief break from developing but am back now.
I fixed this issue by recasting the 1D array as a 2D array in the utils.pyx file. It seemed to work on my machine now.
Most helpful comment
For Sunday fun, I decided to do a
git bisect
to probe when this started happening.The following commit ran successfully and the following commit is where it fails.
Here are the script and Makefile:
bisect.bash.txt
Makefile.txt