Pomegranate: GeneralMixtureModel.predict() gives TypeError: object of type 'numpy.float64' has no len()

Created on 17 Sep 2017 · 5Comments · Source: jmschrei/pomegranate

While going through the tutorial, I am getting TypeError: object of type 'numpy.float64' has no len() . I have verified the behavior on Python 2.7 and Python 3.5:

from pomegranate import *
import numpy as np

d1 = DiscreteDistribution({'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25})
d2 = DiscreteDistribution({'A': 0.10, 'C': 0.40, 'G': 0.40, 'T': 0.10})
gmm = GeneralMixtureModel( [d1, d2] )

seq = list('CGACTACTGACTACTCGCCGACGCGACTGCCGTCTATACTGCGCATACGGC')
gmm_predictions = gmm.predict( np.array(seq) )

ERROR

TypeError                                 Traceback (most recent call last)
<ipython-input-1-8becbf5be0a3> in <module>()
      7 
      8 seq = list('CGACTACTGACTACTCGCCGACGCGACTGCCGTCTATACTGCGCATACGGC')
----> 9 gmm_predictions = gmm.predict( np.array(seq) )

pomegranate/bayes.pyx in pomegranate.bayes.BayesModel.predict()

TypeError: object of type 'numpy.float64' has no len()

How can this be fixed?

Source

tuxdna

Most helpful comment

For Sunday fun, I decided to do a git bisect to probe when this started happening.
The following commit ran successfully and the following commit is where it fails.

Here are the script and Makefile:
bisect.bash.txt
Makefile.txt

jkleckner on 17 Sep 2017

❤1 👍1

All 5 comments

For Sunday fun, I decided to do a git bisect to probe when this started happening.
The following commit ran successfully and the following commit is where it fails.

Here are the script and Makefile:
bisect.bash.txt
Makefile.txt

jkleckner on 17 Sep 2017

❤1 👍1

Managed to narrow down the line where the error is coming from:

In [8]: try: gmm_predictions = gmm.predict( np.array(seq) )
   ...: except:
   ...:     import traceback
   ...:     traceback.print_exc()
   ...:     
Traceback (most recent call last):
  File "<ipython-input-8-88c80acd3363>", line 1, in <module>
    try: gmm_predictions = gmm.predict( np.array(seq) )
  File "pomegranate/bayes.pyx", line 425, in pomegranate.bayes.BayesModel.predict
TypeError: object of type 'numpy.float64' has no len()

Coming from here: https://github.com/jmschrei/pomegranate/blob/master/pomegranate/bayes.pyx#L425

The code says:

        if not self.is_vl_:
            X_ndarray = _check_input(X, self.keymap)
            X_ptr = <double*> X_ndarray.data
========>       n, d = len(X_ndarray), len(X_ndarray[0])   ## <===== here is the error
            if d != self.d:
                raise ValueError("sample only has {} dimensions but should have {} dimensions".format(d, self.d))
        else:
            X_ndarray = X
            n, d = len(X_ndarray), self.d

tuxdna on 17 Sep 2017

Converted n, d = len(X_ndarray), len(X_ndarray[0]) into

n = len(X_ndarray)
d = len(X_ndarray[0])

After splitting the line into two parts, it is clear that the error is due to X_ndarray being one dimensional. This is causing error len(X_ndarray[0])

tuxdna on 17 Sep 2017

Not sure what is the purpose of doing this:

https://github.com/jmschrei/pomegranate/blob/5652aa44bae7442a1d42f60424128ea8356266d7/pomegranate/bayes.pyx#L208

Perhaps something is going on in this function:

https://github.com/jmschrei/pomegranate/blob/5652aa44bae7442a1d42f60424128ea8356266d7/pomegranate/utils.pyx#L309

To invoke this function we do:

keymap = [{key: i for i, key in enumerate(set(functools.reduce(lambda x, y: x+y, [d.keys() for d in gmm.distributions])))}]

print(keymap)
[{'A': 0, 'C': 1, 'T': 2, 'G': 3}]


a = utils._check_input(np.array(seq), keymap)

a.shape
Out[17]: (51,)

print(a)
[ 1.  3.  0.  1.  2.  0.  1.  2.  3.  0.  1.  2.  0.  1.  2.  1.  3.  1.
  1.  3.  0.  1.  3.  1.  3.  0.  1.  2.  3.  1.  1.  3.  2.  1.  2.  0.
  2.  0.  1.  2.  3.  1.  3.  1.  0.  2.  0.  1.  3.  3.  1.]

This does verify that we are getting a one dimensional array in return.

@jmschrei Can you take a look at this?

tuxdna on 18 Sep 2017

Thanks for your work in looking into this. I was taking a brief break from developing but am back now.

I fixed this issue by recasting the 1D array as a 2D array in the utils.pyx file. It seemed to work on my machine now.