Incubator-mxnet: what does mx.sym.Embedding() do for vocab index data

Created on 19 Oct 2016  路  5Comments  路  Source: apache/incubator-mxnet

Hi,
I am developing for text classification using cnn based on the python example provided. I needed to understand what does mx.sym.Embedding() do. I have looked at https://github.com/dmlc/mxnet/issues/2176 but could not follow.

Here is my steps. I take a sentence "Mxnet is cool" and first i pad it to fixed length 56 and then convert each word to vocabulary index. Hence the input to cnn will look like [ 1, 3450, 2303, 0, 0 , ...,0]

This is fed into mx.sym.Embedding() as per the exmple:
embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

here vocab_size is 18765 and num_embed is 300

My question is what is num_embed and what will be the output of mx.sym.Embedding(). How will the result look like.

Most helpful comment

We just pushed some details for the doc of the embedding layers in #3596. They should appear in the online doc once merged. Hope this clarifies the point.

All 5 comments

use infer_shape to help you debug/understand.

# network definition.
input_x = mx.sym.variable('input_x')
embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

# infer_shape
input_shape = {'input_x' : (batch_size, xx, xx)} # choose it by yourself.
embed_layer.infer_shape(**input_shape) 

Thanks for help. I am getting below error. I have a training dataset "x_train". I am not able to figure out what the error is.
I appreciate all your help.
code:

network definition.

input_x = mx.sym.variable('x_train')
embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

infer_shape

input_shape = {'input_x' : (batch_size, 1, 5)} # choose it by yourself.
embed_layer.infer_shape(**input_shape)

error:

AttributeError Traceback (most recent call last)
in ()
1 # network definition.
----> 2 input_x = mx.sym.variable('x_train')
3 embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')
4
5 # infer_shape

AttributeError: module 'mxnet.symbol' has no attribute 'variable'

mx.symbol.Variable

We just pushed some details for the doc of the embedding layers in #3596. They should appear in the online doc once merged. Hope this clarifies the point.

thanks a lot!

Was this page helpful?
0 / 5 - 0 ratings