Numba: unexpected error using jit over a function that uses a local variable on numba 0.35.0 python 3.5

Created on 2 Feb 2018  Â·  11Comments  Â·  Source: numba/numba

Hello, everybody,

I found an unexpected behavior in numba 0.35.0 (python 3.5.4 and numpy 1.13.3) or at least interesting to take in account. I have tried to explain this through the following examples:

from numba import jit
import numpy as np

@jit
def foo1():
    a = np.empty(10, dtype=float)

foo1() # works

@jit
def foo2():
    a = np.empty(10, dtype=float)
    for i in range(5):
        pass
    return i  # or do something with i

foo2() # fails

@jit
def foo3():
    a = np.empty(10, dtype=np.float64)  # notice that use numpy.float64
    for i in range(5):
        pass
    return i  # or  do something with i

foo3() # works

@jit
def foo4():
    a = np.empty(10, dtype=float)
    i = 0
    for i in range(5):
        pass
    return i  # or do something with i

foo4() # works

I'm not an expert, but it seems to be something related with name scope or block scope of C.

bug

Most helpful comment

No problem :) Thanks for your support!

The explanation for the behaviour you are seeing is probably due to object mode vs. no python mode, which is explained here. Essentially Numba doesn't "know" about Pandas so it cannot compile your entire function to machine code, instead it tries to work out if it can compile any loops in it and "lift" them out, then run those in machine code and the rest in the interpreter. Unfortunately you hit a bug in loop lifting. To get really good performance, you could isolate your compute into functions that run in no python mode and then just use the results from those in e.g. the pandas Series bit. In the case of your sample this should have the desired effect.

import numba
import pandas as pd
import numpy as np

# this compiles to machine code
@numba.jit(nopython=True)
def npm_function(quantile: numba.float32[:]) -> int:
    best_distance = 10000
    for i in range(0, len(quantile)):
        distance_to_quantile = 4
        if distance_to_quantile < best_distance:
            best_distance = distance_to_quantile
            best_index = i
    return best_index

# standard python interpreter function to make the numba call and
# then push the result into a Series
def object_mode_function(quantile: numba.float32[:]) -> pd.Series:
    best_index = npm_function(quantile)
    return pd.Series([quantile[best_index]], index=['Quantile'])

df = object_mode_function(quantile=[20, 1000000])
print(df.head())

All 11 comments

I ran into the same problem.

A numba function which contains this code always returns -1. Its almost as alterations to best_index are discarded when the for loop is exited.

best_index = -1
best_distance_to_quantile = 10000
for i in range(0, len(predicted_probability_1)):
    distance_to_quantile = np.abs(quantile[i] - (predicted_probability_1[i])) 
    if distance_to_quantile < best_distance_to_quantile:
        best_distance_to_quantile = distance_to_quantile
        best_index = i
return i

@mmngreco thanks for the report. I can reproduce. Liveness analysis is failing for foo2() as i is not in the live variables at return time.

@sharpe5 thanks for your comment. Please could you provide a full minimal working reproducer so we can check if the underlying the issue is the same, thanks.

Sure, will have this ready in a few hours.

On Mon, 4 Jun 2018 at 10:21, stuartarchibald notifications@github.com
wrote:

@sharpe5 https://github.com/sharpe5 thanks for your comment. Please
could you provide a full minimal working reproducer so we can check if the
underlying the issue is the same, thanks.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/numba/numba/issues/2715#issuecomment-394289642, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABOypFaPEMyUr7FkfpqFtPPUOxj7jstGks5t5PvegaJpZM4R3T29
.

Try this code. I kept removing code to get a minimal example, so the original intent of the code may be slightly obscure.

Its pretty simple - best_index should be 0 on exit (not 1), so it should return quantile[0] which is 20.

# Numba: numba==0.38.0
# Python: 3.6.2
# OS: Win 10 x64
import numba
import pandas as pd
import numpy as np

# @numba.jit
def numba_bug(quantile: numba.float32[:]) -> pd.Series:
    """
    Intent: helper function to pick the expert quantile for each prediction.
    """
    best_index = 1 # Bug in numba: as soon as a variable is marked as global, modifications in inner scope have no effect.
    best_distance = 10000
    for i in range(0, len(quantile)):
        distance_to_quantile = 4
        if distance_to_quantile < best_distance:
            best_distance = distance_to_quantile
            best_index = i
    # print(i)
    # print(best_index)
    # print(quantile[best_index])
    return pd.Series([quantile[best_index]], index=['Quantile'])


df = numba_bug(quantile=[20, 1000000])
df.head()

Correct output (best_index is 0, so quantile[0] is 20):

# Correct output
Out[37]: 
Quantile    20
dtype: int64

If we comment in @numba.jit, then the output is incorrect (best_index is 1, so quantile[1] is 1000000):

# INCORRECT output
Out[37]: 
Quantile    1000000
dtype: int64

If we then comment out this line, then numba throws an exception:

# best_index = 1 # Bug in numba: as soon as a variable is marked as global, modifications in inner scope have no effect.

Output:

@numba.jit
def numba_bug(quantile: numba.float32[:]) -> pd.Series:
    """
    Intent: helper function to pick the expert quantile for each prediction.
    """
    # best_index = 1 # Bug in numba: as soon as a variable is marked as global, modifications in inner scope have no effect.
    best_distance = 10000
    for i in range(0, len(quantile)):
        distance_to_quantile = 4
        if distance_to_quantile < best_distance:
            best_distance = distance_to_quantile
            best_index = i
    # print(i)
    # print(best_index)
    # print(quantile[best_index])
    return pd.Series([quantile[best_index]], index=['Quantile'])
df = numba_bug(quantile=[20, 1000000])
df.head()
Traceback (most recent call last):
  File "C:\Python36\lib\site-packages\numba\errors.py", line 491, in new_error_context
    yield
  File "C:\Python36\lib\site-packages\numba\lowering.py", line 216, in lower_block
    self.lower_inst(inst)
  File "C:\Python36\lib\site-packages\numba\objmode.py", line 65, in lower_inst
    value = self.lower_assign(inst)
  File "C:\Python36\lib\site-packages\numba\objmode.py", line 159, in lower_assign
    return self.lower_expr(value)
  File "C:\Python36\lib\site-packages\numba\objmode.py", line 339, in lower_expr
    index = self.loadvar(expr.index.name)
  File "C:\Python36\lib\site-packages\numba\objmode.py", line 520, in loadvar
    assert name in self._live_vars, name
AssertionError: best_index
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Python36\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-53-6a2346969421>", line 19, in <module>
    df = numba_bug(quantile=[20, 1000000])
  File "C:\Python36\lib\site-packages\numba\dispatcher.py", line 360, in _compile_for_args
    raise e
  File "C:\Python36\lib\site-packages\numba\dispatcher.py", line 311, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "C:\Python36\lib\site-packages\numba\dispatcher.py", line 618, in compile
    cres = self._compiler.compile(args, return_type)
  File "C:\Python36\lib\site-packages\numba\dispatcher.py", line 83, in compile
    pipeline_class=self.pipeline_class)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 871, in compile_extra
    return pipeline.compile_extra(func)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 365, in compile_extra
    return self._compile_bytecode()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 802, in _compile_bytecode
    return self._compile_core()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 789, in _compile_core
    res = pm.run(self.status)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 251, in run
    raise patched_exception
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 243, in run
    stage()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 436, in stage_objectmode_frontend
    cres = self.frontend_looplift()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 426, in frontend_looplift
    lifted=tuple(loops), lifted_from=None)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 885, in compile_ir
    lifted_from=lifted_from)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 373, in compile_ir
    return self._compile_ir()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 809, in _compile_ir
    return self._compile_core()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 789, in _compile_core
    res = pm.run(self.status)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 251, in run
    raise patched_exception
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 243, in run
    stage()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 650, in stage_objectmode_backend
    self._backend(lowerfn, objectmode=True)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 626, in _backend
    lowered = lowerfn()
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 599, in backend_object_mode
    self.flags)
  File "C:\Python36\lib\site-packages\numba\compiler.py", line 1016, in py_lowering_stage
    lower.lower()
  File "C:\Python36\lib\site-packages\numba\lowering.py", line 135, in lower
    self.lower_normal_function(self.fndesc)
  File "C:\Python36\lib\site-packages\numba\lowering.py", line 176, in lower_normal_function
    entry_block_tail = self.lower_function_body()
  File "C:\Python36\lib\site-packages\numba\lowering.py", line 201, in lower_function_body
    self.lower_block(block)
  File "C:\Python36\lib\site-packages\numba\lowering.py", line 216, in lower_block
    self.lower_inst(inst)
  File "C:\Python36\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Python36\lib\site-packages\numba\errors.py", line 499, in new_error_context
    six.reraise(type(newerr), newerr, tb)
  File "C:\Python36\lib\site-packages\numba\six.py", line 659, in reraise
    raise value
numba.errors.LoweringError: Failed at object (object mode frontend)
Failed at object (object mode backend)
best_index
File "<ipython-input-53-6a2346969421>", line 16:
def numba_bug(quantile: numba.float32[:]) -> pd.Series:
    <source elided>
    # print(quantile[best_index])
    return pd.Series([quantile[best_index]], index=['Quantile'])
    ^
[1] During: lowering "$48.5 = getitem(value=quantile, index=best_index)" at <ipython-input-53-6a2346969421> (16)
-------------------------------------------------------------------------------
This should not have happened, a problem has occurred in Numba's internals.
Please report the error message and traceback, along with a minimal reproducer
at: https://github.com/numba/numba/issues/new
If you need help writing a minimal reproducer please see:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
If more help is needed please feel free to speak to the Numba core developers
directly at: https://gitter.im/numba/numba
Thanks in advance for your help in improving Numba!

If we then comment out numba, we still get the correct result (commenting it in will generate the exception above).

# @numba.jit
def numba_bug(quantile: numba.float32[:]) -> pd.Series:
    """
    Intent: helper function to pick the expert quantile for each prediction.
    """
    # best_index = 1 # Bug in numba: as soon as a variable is marked as global, modifications in inner scope have no effect.
    best_distance = 10000
    for i in range(0, len(quantile)):
        distance_to_quantile = 4
        if distance_to_quantile < best_distance:
            best_distance = distance_to_quantile
            best_index = i
    # print(i)
    # print(best_index)
    # print(quantile[best_index])
    return pd.Series([quantile[best_index]], index=['Quantile'])


df = numba_bug(quantile=[20, 1000000])
df.head()

Correct output:

# Correct output
Out[54]: 
Quantile    20
dtype: int64

If I replace the return pd.Series with a simple return quantile[best_index], then everything always works perfectly with or without numba.

I feel that this issue is some odd interaction between pd.Series() and numba.

Thanks @sharpe5. What you report is a similar but tiny bit different issue, root cause is the same though, best_index is considered local/dead as it doesn't escape the lifted loop. As a result the outer scope is not updated/it is considered dead so inaccessible. Setting the decorator to @numba.jit(looplift=False) "fixes" it but at the cost of not having loop lifting.

Thanks for your quick reply. As mentioned in the final two sentences of my example, it seems odd that everything works fine with or without numba if quantile[best_index] is returned.

Its only if pd.Series is returned that the problem exhibits itself.

I found that about half of my functions did not return the correct results
if numba was activated, probably as most of them returned pd.Series, so I
ended up commenting numba out altogether.

Having said that, numba is just amazing, and is far more flexible for
interactive computing compared to Cython (which itself excels at producing
fast code within packages). Keep up the good work, I have a huge amount of
respect for the contributors to numba.

On Mon, 4 Jun 2018 15:22 stuartarchibald, notifications@github.com wrote:

Thanks @sharpe5 https://github.com/sharpe5. What you report is a
similar but tiny bit different issue, root cause is the same though,
best_index is considered local/dead as it doesn't escape the lifted loop.
As a result the outer scope is not updated/it is considered dead so
inaccessible. Setting the decorator to @numba.jit(looplift=False) "fixes"
it but at the cost of not having loop lifting.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numba/numba/issues/2715#issuecomment-394372055, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABOypHi5RVoZsexhWkgVGOsWb3NVFpfVks5t5UJlgaJpZM4R3T29
.

No problem :) Thanks for your support!

The explanation for the behaviour you are seeing is probably due to object mode vs. no python mode, which is explained here. Essentially Numba doesn't "know" about Pandas so it cannot compile your entire function to machine code, instead it tries to work out if it can compile any loops in it and "lift" them out, then run those in machine code and the rest in the interpreter. Unfortunately you hit a bug in loop lifting. To get really good performance, you could isolate your compute into functions that run in no python mode and then just use the results from those in e.g. the pandas Series bit. In the case of your sample this should have the desired effect.

import numba
import pandas as pd
import numpy as np

# this compiles to machine code
@numba.jit(nopython=True)
def npm_function(quantile: numba.float32[:]) -> int:
    best_distance = 10000
    for i in range(0, len(quantile)):
        distance_to_quantile = 4
        if distance_to_quantile < best_distance:
            best_distance = distance_to_quantile
            best_index = i
    return best_index

# standard python interpreter function to make the numba call and
# then push the result into a Series
def object_mode_function(quantile: numba.float32[:]) -> pd.Series:
    best_index = npm_function(quantile)
    return pd.Series([quantile[best_index]], index=['Quantile'])

df = object_mode_function(quantile=[20, 1000000])
print(df.head())

Excellent, that explains everything - in general, make life easy for numba and get optimum performance
by giving it functions which can be compiled to 100% machine code. Thanks
again!

On Mon, 4 Jun 2018 at 15:57, stuartarchibald notifications@github.com
wrote:

No problem :) Thanks for your support!

The explanation for the behaviour you are seeing is probably due to object
mode vs. no python mode, which is explained here
http://numba.pydata.org/numba-doc/latest/user/performance-tips.html#no-python-mode-vs-object-mode.
Essentially Numba doesn't "know" about Pandas so it cannot compile your
entire function to machine code, instead it tries to work out if it can
compile any loops in it and "lift" them out, then run those in machine code
and the rest in the interpreter. Unfortunately you hit a bug in loop
lifting. To get really good performance, you could isolate your compute
into functions that run in no python mode and then just use the results
from those in e.g. the pandas Series bit. In the case of your sample this
should have the desired effect.

import numbaimport pandas as pdimport numpy as np

this compiles to machine [email protected](nopython=True)def npm_function(quantile: numba.float32[:]) -> int:

best_distance = 10000
for i in range(0, len(quantile)):
    distance_to_quantile = 4
    if distance_to_quantile < best_distance:
        best_distance = distance_to_quantile
        best_index = i
return best_index

standard python interpreter function to make the numba call and# then push the result into a Seriesdef object_mode_function(quantile: numba.float32[:]) -> pd.Series:

best_index = npm_function(quantile)
return pd.Series([quantile[best_index]], index=['Quantile'])

df = object_mode_function(quantile=[20, 1000000])print(df.head())

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numba/numba/issues/2715#issuecomment-394384813, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABOypEJWazMkfQQPXWkN-UlwBxpN1X9fks5t5Uq1gaJpZM4R3T29
.

Closed by #3230.

Nice work guys! I really do mean this when I say that Numba is such a fine piece of engineeering, it should be displayed in the Louvre as a work of art :)

Was this page helpful?
0 / 5 - 0 ratings