Numba: argument X: Cannot type list element of <class 'dict'>

Created on 9 Nov 2020  路  3Comments  路  Source: numba/numba

Am working with big-sized files with formatted json lines in the form:
jsn = [{'a':1.23,'b':324.3242,'c':-.2343242},{'a':21.23,'b':3.3242,'c':-3.2343242}] # this is example of one line
The goal is to concatenate the dict values into numpy array.

The error thrown by @njit is: - argument 2: Cannot type list element of <class 'dict'>
I understand the issue and was trying to find some workaround. Since the #4848 is one of the few posts on similar issue (but w/o code), I am posting here minimum example:

from numba import njit

@njit   # works nicely w/o njit
def store_func(data_store,rows_lim,jsn):
    for ii in range(len(jsn)):
        data_store = np.concatenate((data_store[-rows_lim:,:], np.array([[ jsn[ii]['a'],jsn[ii]['b'],jsn[ii]['c'] ]])),axis=0)   
    return data_store

def on_message(data_store,rows_lim):
    jsn = [{'a':1.23,'b':324.3242,'c':-.2343242},{'a':21.23,'b':3.3242,'c':-3.2343242}] # the json line
    data_store = store_func(data_store,rows_lim,jsn)

data_store = np.ones([0,3],float) # init
rows_lim   = 1000
%timeit on_message(data_store,rows_lim)   # jupyter notebook

Now imagine you have millions of on_message calls with hundreds of jsn elements len(jsn)>>100. Is there any reasonable workaround? Thank you, and thank you for the awesome library.

question

All 3 comments

Thanks for the report. Numba's discussion forum is a great place to ask this sort of thing https://numba.discourse.group/c/numba/, there's a What is this error message? category along with a place for asking for more general help. As a brief answer, if the goal is to speed up creation of a NumPy array from json records, Numba is not going to be able to do much with your code (see what Numba is good at in the 5 minute guide to Numba) and the operations are likely memory bound anyway. With this in mind, something like this should help:

import numpy as np

DATA_SIZE = 10000

def store_func(data_store,rows_lim,jsn):
    for ii in range(len(jsn)):
        data_store = np.concatenate((data_store[-rows_lim:,:], np.array([[ jsn[ii]['a'],jsn[ii]['b'],jsn[ii]['c'] ]])),axis=0)
    return data_store

jsn = [{'a':1.23,'b':324.3242,'c':-.2343242}] * DATA_SIZE

data_store = np.ones([0,3], np.float64)
rows_lim   = 1000
gold = store_func(data_store, rows_lim , jsn)

%timeit store_func(data_store, rows_lim, jsn)

def quicker_store_func(rows_lim, jsn):
    ljsn = len(jsn)
    # not sure what rows_lim is for?
    if rows_lim > ljsn:
        n = ljsn
    else:
        n = rows_lim + 1
    data_store = np.empty((n, 3))
    for ii in range(n):
        data_store[ii] = jsn[ii]['a'], jsn[ii]['b'], jsn[ii]['c']
    return data_store

check = quicker_store_func(rows_lim, jsn)

np.testing.assert_allclose(gold, check)

%timeit quicker_store_func(rows_lim, jsn)

which gives a 64x improvement:

138 ms 卤 160 碌s per loop (mean 卤 std. dev. of 7 runs, 10 loops each)
2.15 ms 卤 9.57 碌s per loop (mean 卤 std. dev. of 7 runs, 100 loops each)

this improvement is largely from two things.

  1. The size of the output array is known up front, so creating it and writing into it directly is going to be a lot more efficient as it is a single allocation and being accessed linearly.
  2. There are no temporaries in the quicker function, the values are read from the dictionary and written directly into the output. Temporary arrays from doing np.array, temporary lists from doing [[jsn[ii]['a']... etc]] and then more temporary arrays from doing a concatenate in a loop where the output array is thrown away and reassigned in each loop, all add up.

Hope this helps.

And to answer what the error message is, Numba doesn't support Python dictionaries, they have to be converted to numba.typed.Dict instances first: https://numba.readthedocs.io/en/stable/reference/pysupported.html#typed-dict

Thank you, I appreciate the answer.
Good point 2. regarding the temporaries. Keep it simple is always a good way or at least first to try :)

Was this page helpful?
0 / 5 - 0 ratings