Incubator-mxnet: Multiple output with LinearRegressionOutput

Created on 13 May 2016 · 11Comments · Source: apache/incubator-mxnet

Hello,

This might be very trivial. I am trying to build a regression nnet that predicts 2 values per observation but learning systematically fails. I thought that LinearRegressionOutput supports multiple outputs but I might be wrong?

Here is an example with the iris dataset:

``` data(iris)
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=16)
fc2 <- mx.symbol.FullyConnected(fc1, num_hidden=2)
out <- mx.symbol.LinearRegressionOutput(data = fc2)

model <- mx.model.FeedForward.create(X = t(data.matrix(iris[,-c(1, 2)])),
y = t(data.matrix(iris[, c(1,2)])),
symbol = out,
num.round = 10,
learning.rate = 0.001,
momentum = 0.9,
eval.metric = mx.metric.rmse,
array.batch.size = 10,
array.layout = "colmajor")

Straining with 1 devices
[1] Train-rmse=NaN
[2] Train-rmse=NaN
[3] Train-rmse=NaN
[4] Train-rmse=NaN
[5] Train-rmse=NaN
[6] Train-rmse=NaN
[7] Train-rmse=NaN
[8] Train-rmse=NaN
[9] Train-rmse=NaN
[10] Train-rmse=NaN

```

And if I try this nnet with my dataset, R crashes after a while without displaying any error.
Thank you for your help,

Michel

Source

MichelV2

Most helpful comment

Hi, I needed a multiple output regression in mxnet using R, so I have translated the example python code provided by @Piyush3dB into R (although I use a different example). Code and results below:

Results:

[1] "Train rmse:"
        y1         y2         y3 
0.05941047 0.07813825 0.01670451 
[1] "Test rmse:"
        y1         y2         y3 
0.05959333 0.08441392 0.01816598

Plots:
mxnet_regression_multiple_outputs

Code:

## Translated to R from python example at:
# https://github.com/dmlc/mxnet/issues/2138#issuecomment-222812951

# MXNET settings:
nRounds <-300
nHidden <- 30
optimizer <- "rmsprop"
array.layout <- "rowmajor"
ctx <- mx.cpu()
initializer <- mx.init.Xavier()

# Data settings:
nObservations <- 2000
noiseLvl <- 0.5
nOutput <- 3
set.seed(42)
mx.set.seed(42)

get_mlp <- function() {
  # multi-layer perceptron
  label = mx.symbol.Variable('label')
  data = mx.symbol.Variable('data')
  flat = mx.symbol.Flatten(data=data)
  fc1  = mx.symbol.FullyConnected(data = flat, name='fc1', num_hidden=nHidden)
  act1 = mx.symbol.Activation(data = fc1, name='tanh1', act_type="tanh")
  fc2  = mx.symbol.FullyConnected(data = act1, name='fc2', num_hidden=nOutput)
  net  = mx.symbol.LinearRegressionOutput(data=fc2, label=label, name='lro')
  return(net)
}

# Generate some random data
df <- data.frame(x1=rnorm(nObservations), 
                 x2=rnorm(nObservations), 
                 x3=rnorm(nObservations),
                 x4=rnorm(nObservations))
expts <- list()
for (outIdx in 1:nOutput) {
  expts[[outIdx]] <- sample(0:3, 4, replace=T)
  df[[paste0("y", outIdx)]] <- df$x1^expts[[outIdx]][1] + 
    df$x2^expts[[outIdx]][2] + df$x3^expts[[outIdx]][3] +
    df$x4^expts[[outIdx]][4] + noiseLvl*rnorm(nObservations)
}

respCols <- paste0("y", 1:nOutput)

# Scale data to zero-mean unit-variance
df <- data.frame(scale(df))

# Split into training and test sets
test.ind = seq(1, nObservations, 10)  # 1 in 10 smaples for testing
train.x = data.matrix(df[-test.ind, -which(names(df) %in% respCols)])
train.y = data.matrix(df[-test.ind, respCols])
test.x = data.matrix(df[test.ind, -which(names(df) %in% respCols)])
test.y = data.matrix(df[test.ind, respCols])

# Setup iterators
trainIter = mx.io.arrayiter(data = t(train.x), label = t(train.y))
valIter   = mx.io.arrayiter(data = t(test.x) , label = t(test.y))

# Get model and train
net = get_mlp()

model = mx.model.FeedForward.create(X=trainIter,
                                    eval.data=valIter,
                                    ctx=ctx,
                                    symbol=net,
                                    num.round=nRounds,
                                    initializer=initializer,
                                    optimizer=optimizer,
                                    array.layout=array.layout
                                    )

# Prediction
train.Response <- t(predict(model, train.x, array.layout=array.layout))
test.Response <- t(predict(model, test.x, array.layout=array.layout))

# Results
print("Train rmse:")
print(colMeans((train.Response - train.y)^2))

print("Test rmse:")
print(colMeans((test.Response - test.y)^2))


par(mfrow=c(nOutput, 2))
for (outIdx in 1:nOutput) {
  plot(train.y[, outIdx], train.Response[, outIdx],
       xlab="Actual output", ylab="Model Response", 
       main=paste0("train perf. output ", outIdx))
  abline(0,1)

  plot(test.y[, outIdx], test.Response[, outIdx],
       xlab="Actual output", ylab="Model Response", 
       main=paste0("test perf output ", outIdx))
  abline(0,1)
}

khalida on 2 Jan 2017

👍2 🎉1

All 11 comments

Same error here. I would like to know too. Trying to use (multivariate) LogisticRegression with multiple outputs...

ivokwee on 27 May 2016

Try reducing your learning rate. Does that help?

Piyush3dB on 27 May 2016

no... even at learning.rate=1e-20 or 1e-99

Is the syntax OK? Somewhere people do complicated stuff with Group, bind and executors???

ivokwee on 27 May 2016

Try setup a monitor on the weights and gradients so that you can see the values in the logging

Piyush3dB on 27 May 2016

yikes... started only 1 week with mxnet... pointers how to setup a monitor (in R)? I wonder if label was ever made to be a matrix rather than single vector of labels. I peeked into the source R-package/R/model.R where label is frequently referred to with 'length(y)' (e.g. in mx.model.init.iter), meaning it wasn't expected as (multivariate) output.

ivokwee on 27 May 2016

weird... say 1 out of 10 times, especially after switching gpu device and setting learning rate small like 1e-6, I get no NaN's. But then NaN starts coming out again. Even using cpu. weird.

ivokwee on 27 May 2016

Getting same error here. Multiple regression output would be fantastic.

alexvpickering on 28 May 2016

Has there been any movement on this? I'd be interested to see the correct way to handle this!

bdlacree on 31 May 2016

I'm not an R person, but here is a python example showing a 30-valued output multiple regression setup on some artificial data. This model converges:

import find_mxnet
import mxnet as mx
from load_data import load
from sklearn.cross_validation import train_test_split
import logging
import pdb as pdb
import numpy as np
import matplotlib.pyplot as plt

def get_mlp():
    """
    multi-layer perceptron
    """

    outLabl = mx.sym.Variable('softmax_label')

    data = mx.symbol.Variable('data')

    flat = mx.symbol.Flatten(data=data)

    fc1  = mx.symbol.FullyConnected(data = flat, name='fc1', num_hidden=100)
    act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
    fc2  = mx.symbol.FullyConnected(data = act1, name='fc2', num_hidden=30)
    net  = mx.sym.LinearRegressionOutput(data=fc2, label=outLabl, name='linreg1')

    return net

#
# Load data
#
# Create artificial data
X = np.ones((2140, 9216)).reshape((2140, 1, 96, 96))
y = 0.6*np.ones((2140, 30))

#
# Setup iterators
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state = 42)
trainIter = mx.io.NDArrayIter(data = X_train, label = y_train, batch_size = 64)
valIter   = mx.io.NDArrayIter(data = X_test , label = y_test , batch_size = 64)

#
# Multidevice kvstore setup and logging
#
kv = mx.kvstore.create('local')
head = '%(asctime)-15s Node[' + str(kv.rank) + '] %(message)s'
logging.basicConfig(level=logging.DEBUG, format=head)

#
# Get model and train
#
net = get_mlp()

model = mx.model.FeedForward(
        ctx                = mx.gpu(),
        symbol             = net,
        num_epoch          = 15,
        learning_rate      = 0.001,
        momentum           = 0.9,
        wd                 = 0.00001,
        initializer        = mx.init.Xavier(factor_type="in", magnitude=2.34),
        )
model.fit(X=trainIter, eval_data=valIter, batch_end_callback=mx.callback.Speedometer(1,50), epoch_end_callback=None, eval_metric='rmse')

#
# Prediction
#

valIter.reset()
for prediction in model.predict(valIter):
    print prediction
    pdb.set_trace()

Piyush3dB on 31 May 2016

Had exactly the same problem(in r), however switching from Windows 10 to Ubuntu 16 and building the latest version of mxnet and the r package solved the problem for me.

chm90 on 23 Jun 2016

Results:

[1] "Train rmse:"
        y1         y2         y3 
0.05941047 0.07813825 0.01670451 
[1] "Test rmse:"
        y1         y2         y3 
0.05959333 0.08441392 0.01816598

Plots:
mxnet_regression_multiple_outputs

Code:

## Translated to R from python example at:
# https://github.com/dmlc/mxnet/issues/2138#issuecomment-222812951

# MXNET settings:
nRounds <-300
nHidden <- 30
optimizer <- "rmsprop"
array.layout <- "rowmajor"
ctx <- mx.cpu()
initializer <- mx.init.Xavier()

# Data settings:
nObservations <- 2000
noiseLvl <- 0.5
nOutput <- 3
set.seed(42)
mx.set.seed(42)

get_mlp <- function() {
  # multi-layer perceptron
  label = mx.symbol.Variable('label')
  data = mx.symbol.Variable('data')
  flat = mx.symbol.Flatten(data=data)
  fc1  = mx.symbol.FullyConnected(data = flat, name='fc1', num_hidden=nHidden)
  act1 = mx.symbol.Activation(data = fc1, name='tanh1', act_type="tanh")
  fc2  = mx.symbol.FullyConnected(data = act1, name='fc2', num_hidden=nOutput)
  net  = mx.symbol.LinearRegressionOutput(data=fc2, label=label, name='lro')
  return(net)
}

# Generate some random data
df <- data.frame(x1=rnorm(nObservations), 
                 x2=rnorm(nObservations), 
                 x3=rnorm(nObservations),
                 x4=rnorm(nObservations))
expts <- list()
for (outIdx in 1:nOutput) {
  expts[[outIdx]] <- sample(0:3, 4, replace=T)
  df[[paste0("y", outIdx)]] <- df$x1^expts[[outIdx]][1] + 
    df$x2^expts[[outIdx]][2] + df$x3^expts[[outIdx]][3] +
    df$x4^expts[[outIdx]][4] + noiseLvl*rnorm(nObservations)
}

respCols <- paste0("y", 1:nOutput)

# Scale data to zero-mean unit-variance
df <- data.frame(scale(df))

# Split into training and test sets
test.ind = seq(1, nObservations, 10)  # 1 in 10 smaples for testing
train.x = data.matrix(df[-test.ind, -which(names(df) %in% respCols)])
train.y = data.matrix(df[-test.ind, respCols])
test.x = data.matrix(df[test.ind, -which(names(df) %in% respCols)])
test.y = data.matrix(df[test.ind, respCols])

# Setup iterators
trainIter = mx.io.arrayiter(data = t(train.x), label = t(train.y))
valIter   = mx.io.arrayiter(data = t(test.x) , label = t(test.y))

# Get model and train
net = get_mlp()

model = mx.model.FeedForward.create(X=trainIter,
                                    eval.data=valIter,
                                    ctx=ctx,
                                    symbol=net,
                                    num.round=nRounds,
                                    initializer=initializer,
                                    optimizer=optimizer,
                                    array.layout=array.layout
                                    )

# Prediction
train.Response <- t(predict(model, train.x, array.layout=array.layout))
test.Response <- t(predict(model, test.x, array.layout=array.layout))

# Results
print("Train rmse:")
print(colMeans((train.Response - train.y)^2))

print("Test rmse:")
print(colMeans((test.Response - test.y)^2))


par(mfrow=c(nOutput, 2))
for (outIdx in 1:nOutput) {
  plot(train.y[, outIdx], train.Response[, outIdx],
       xlab="Actual output", ylab="Model Response", 
       main=paste0("train perf. output ", outIdx))
  abline(0,1)

  plot(test.y[, outIdx], test.Response[, outIdx],
       xlab="Actual output", ylab="Model Response", 
       main=paste0("test perf output ", outIdx))
  abline(0,1)
}

khalida on 2 Jan 2017

👍2 🎉1

Was this page helpful?

0 / 5 - 0 ratings