Incubator-mxnet: Deserialization problem with gluon `ValueError: There are multiple outputs with name ...`

Created on 11 Oct 2018  路  10Comments  路  Source: apache/incubator-mxnet

Description

For a simple HybridBlock, saving and deserializing the symbol fails with mxnet 1.3 when an embedding layer is used multiple times. This used to work with mxnet 1.2

It may or may not be related to this issue: https://github.com/apache/incubator-mxnet/issues/12783

Environment info (Required)

----------Python Info----------
Version      : 3.6.3
Compiler     : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)
Build        : ('default', 'Mar 20 2018 21:25:13')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : /Users/.../.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.0
Directory    : /Users/.../.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet
Commit Hash   : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
----------System Info----------
Platform     : Darwin-16.7.0-x86_64-i386-64bit
system       : Darwin
node         : ...
release      : 16.7.0
version      : Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0440 sec, LOAD: 0.9637 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0922 sec, LOAD: 1.0902 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0733 sec, LOAD: 0.8020 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0545 sec, LOAD: 0.6035 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0400 sec, LOAD: 1.1772 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0443 sec, LOAD: 0.2266 sec.

Package used (Python/R/Scala/Julia):
I'm using python

Error Message:

(Paste the complete error message, including stack trace.)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-628e9346e9c4> in <module>()
     21 test_op.export('/tmp/bla')
     22 
---> 23 mx.gluon.SymbolBlock.imports('/tmp/bla-symbol.json', param_file='/tmp/bla-0000.params', input_names=['data0', 'data1'])

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in imports(symbol_file, input_names, param_file, ctx)
   1021             input_names = [input_names]
   1022         inputs = [symbol.var(i) for i in input_names]
-> 1023         ret = SymbolBlock(sym, inputs)
   1024         if param_file is not None:
   1025             ret.collect_params().load(param_file, ctx=ctx)

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in __init__(self, outputs, inputs, params)
   1049         row_sparse_storage = ndarray.ndarray._STORAGE_TYPE_STR_TO_ID['row_sparse']
   1050         for i in out:
-> 1051             for j in i.get_internals():
   1052                 assert(j.attr("__storage_type__") != str(row_sparse_storage)), \
   1053                     "SymbolBlock doesn't support Parameter '%s' because its storage " \

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/symbol/symbol.py in <genexpr>(.0)
     91         <Symbol _plus0>
     92         """
---> 93         return (self[i] for i in self.list_outputs())
     94 
     95     def __add__(self, other):

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/symbol/symbol.py in __getitem__(self, index)
    515                 if name == index:
    516                     if idx is not None:
--> 517                         raise ValueError('There are multiple outputs with name \"%s\"' % index)
    518                     idx = i
    519             if idx is None:

ValueError: There are multiple outputs with name "testop1_embedding0_fwd_output"

Minimum reproducible example

import mxnet as mx
from mxnet import gluon

class TestOp(gluon.HybridBlock):
    def __init__(self, n_in, n_out):
        super().__init__()
        with self.name_scope():
            self.embed = mx.gluon.nn.Embedding(n_in, n_out)

    def hybrid_forward(self, F, x, y):
        a = self.embed(x)
        b = self.embed(y)
        return a + b

test_op = TestOp(n_in=5, n_out=2)
test_op.initialize()
test_op.hybridize()

test_op(mx.nd.array([0,1,2]), mx.nd.array([1,2,3]))

test_op.export('/tmp/bla')

gluon.SymbolBlock.imports(
    '/tmp/bla-symbol.json',
    param_file='/tmp/bla-0000.params', 
    input_names=['data0', 'data1'])

Steps to reproduce

Run the code.

Bug Gluon

All 10 comments

@srochel @lupesko

@mxnet-label-bot [Bug, Gluon]

This seems similar to #12783

Looks like the problem occurs with Dense as well, so the issue probably lies in the +:

import mxnet as mx

class MyBlock(mx.gluon.HybridBlock):
    def __init__(self):
        super().__init__()
        with self.name_scope():
            self.model = mx.gluon.nn.Dense(units=5)

    def hybrid_forward(self, F, x, y):
        return self.model(x) + self.model(y)

block = MyBlock()
block.initialize()
block.hybridize()

output = block(mx.nd.random_normal(shape=(100,)), mx.nd.random_normal(shape=(100,)))

block.export(path="./model", epoch=0)
symbol = mx.gluon.SymbolBlock.imports(
    symbol_file="./model-symbol.json",
    input_names=["data0", "data1"],
    param_file="./model-0000.params",
    ctx=mx.Context.default_ctx
)

gives

Traceback (most recent call last):
  File "2018-10-11-very-weird-issue.py", line 23, in <module>
    ctx=mx.Context.default_ctx
  File "[...]/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1023, in imports
    ret = SymbolBlock(sym, inputs)
  File "[...]/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1051, in __init__
    for j in i.get_internals():
  File "[...]/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 93, in <genexpr>
    return (self[i] for i in self.list_outputs())
  File "[...]/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 517, in __getitem__
    raise ValueError('There are multiple outputs with name \"%s\"' % index)
ValueError: There are multiple outputs with name "myblock0_dense0_fwd_output"

I observed that the issue occurs when 2 inputs pass through the same block. Trying to understand the root cause for this as it works fine with mxnet 1.2.1

https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L1050
this is the new addition after 1.2.1 which calls output symbol.get_internals() and then finds duplicate names and fails, exposing the duplicate names issue from 1.3 onwards.
Duplicate output names when we have 2 inputs passing through the same block was always the case, I am not very sure why this has not created issues for our users.

Trying to root cause and solution with the help of @safrooze and @zhreshold

@szha

@lostella while the issue is being root caused, one work around in this case would be to use different blocks with shared parameters:

class MyBlock(mx.gluon.HybridBlock):
    def __init__(self):
        super().__init__()
        with self.name_scope():
            self.model0 = mx.gluon.nn.Dense(units=5)
            self.model1 = mx.gluon.nn.Dense(units=5, params=self.model0.collect_params())

    def hybrid_forward(self, F, x, y):
        return self.model0(x) + self.model1(y)

i use mxnet 1.3.0 also meet this problem, In my code ,it was caused by code ' sym.get_internals()', after I deleted it , then it can run .

Need to check and see if issue is resolved in https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-504249531

The "Minimum reproducible example" works for me on 1.5 and current master. This can probably be closed?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sbodenstein picture sbodenstein  路  3Comments

xzqjack picture xzqjack  路  3Comments

Zhaoyang-XU picture Zhaoyang-XU  路  3Comments

dushoufu picture dushoufu  路  3Comments

JonBoyleCoding picture JonBoyleCoding  路  3Comments