Dlib: Example of DCGAN

Created on 21 May 2019  Â·  67Comments  Â·  Source: davisking/dlib

Hi, I would like to contribute a DCGAN example to dlib.

I have implemented a version of Pytorch DCGAN for C++.

However, I would need some guidance with some things I don't know how to do. I am wondering on how I should proceed. Should I attach my current code here (around 150 lines), or make a pull request, even if the code is not able to learn anything? Maybe @edubois can help out, since he stated that he managed to make it work on https://github.com/davisking/dlib/issues/1261

Thanks for your hard work on dlib.

enhancement

Most helpful comment

I have managed to make it work, you can try the working version here.

I will add more comments and make it look more like an example and submit a PR. Thank you all for your help.

This is what snapshots every 5000 training iterations look like:

tiled_image

All 67 comments

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-07-05 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

I'm still working on this... I'm going to write about the network architecture

  • loss_binary_log
  • discriminator
  • generator
  • input

Here's what they look like:

// convolution and transposed convolution with custom padding
template<long num_filters, long kernel_size, int stride, int padding, typename SUBNET>
using conp = add_layer<con_<num_filters, kernel_size, kernel_size, stride, stride, padding, padding>, SUBNET>;
template<long num_filters, long kernel_size, int stride, int padding, typename SUBNET>
using contp = add_layer<cont_<num_filters, kernel_size, kernel_size, stride, stride, padding, padding>, SUBNET>;
// the generator
template<typename SUBNET>
using generator_type =
    htan<contp<1, 4, 2, 1,
    relu<bn_con<contp<64, 4, 2, 1,
    relu<bn_con<contp<128, 3, 2, 1,
    relu<bn_con<contp<256, 4, 1, 0,
    SUBNET>>>>>>>>>>>;
// the discriminator
template<typename SUBNET>
using discriminator_type =
    loss_binary_log<
    affine<conp<1, 3, 1, 0,
    prelu<bn_con<conp<256, 4, 2, 1,
    prelu<bn_con<conp<128, 4, 2, 1,
    prelu<conp<64, 4, 2, 1,
    SUBNET>>>>>>>>>>>;
// and the whole network
using net_type =
    discriminator_type<
    // tag5 gets the mask from tag1, multiplies it by the noise and adds it to the image
    tag5<add_prev2<mult_prev4<extract<input_size * input_size * 2, 1, input_size, input_size, skip1<
    // tag4 generates an image from the noise
    tag4<generator_type<
    // tag3 gets the noise from tag1
    tag3<extract<input_size * input_size, 100, 1, 1, skip1<
    // tag2 gets the image from tag1
    tag2<extract<0, 1, input_size, input_size,
    // tag1 contains the image, the noise and a mask
    tag1<input<std::array<matrix<float>, 3>>>
    >>>>>>>>>>>>>;

I know the architecture looks a bit weird, but the main idea is that the input has 3 channels:

  1. a real image
  2. random noise
  3. a mask (zeros or ones)

Then, when I train with real images or random noise I set the mask to ignore the input I don't care about.
The main problem I face is when I want to back propagate the error with real images, where the discriminator is not involved, I don't know how to stop the back propagation in tag5. Maybe using visit_layers_until_tag, but I haven't managed to make it work.
By the way, the network trains a pretty good discriminator, which after a few iterations has a loss of around 10e-9, but the generator sucks...

Any help or guidance is appreciated, but I will continue digging :)

You might be better off with two separate net objects and to alternate
between training them.

Thanks for the suggestion, that was my first approach, but I couldn't make it work.
I'll give it another try :)

Hi @arrufat, have you please finally published a GAN network using DLIB on your GitHub home page? Btw, thank you for the definition of the Resxxx networks you give, it is very useful and I am currently computing some models that could be useful to the community (e.g. gender and age). I will submit them to @davisking if the results obtained are interesting, in order to enhance the current framework.

@Cydral, yes, that's the whole point of this, I want to make a working DCGAN example and share it with everyone, but first, and most importantly, I need to get it working, which doesn't seem trivial. As soon as I got something, I'll update this issue.
Also, thanks for your kind works on my ResNet implementations. I want to simplify the code a little bit, mostly by defining the models as templates that depend only on bn_con or affine so that I don't have to duplicate models for training and inference. Here's what I have in mind, but still thinking about it:

namespace resnet50
{
    // resnet backbone definition goes here
    template<template<typename> class BN>
    using model = loss_multiclass_log<fc<1000, backbone<BN, input_rgb_image>>>

    using train = model<bn_con>
    using infer = model<affine>
}

Then you could use it in your own code like:

resnet50::train net;

or

resnet50::infer net;

Hi @arrufat, it does indeed sound good and it may actually avoid overwriting a model with the part as it has happened to me in the past! For the models I would like to propose, I work more on layer reduction, the idea being to have so-called minimalist but nevertheless effective models. I already did lots of tests using Resnet-18 and I think I will also try a Resnet-12 type, also by minimizing the size of the input image as much as possible.

For the DCGAN, I considered what you reported and found the approach interesting but reinjecting the loss value into the network seems difficult in this manner.

On my side, I'm trying to see if the loss function (e.g. loss_multiclass_log_per_pixel) could not be modified to include a network (discriminator) on the one hand and if the output value of this network could simply not be returned by the loss function for the weight adjustment, on the other hand.
According to the Dlib implementation, because the loss function has both the current/target image and the input image, training the discriminator at this level should be possible, allowing to maintain the current loss back-propagation mechanism. Does that seem appropriate?

That seems an odd choice, but maybe not weirder that my joint network architecture, let's see how it turns out.

I saw Davis' comment and it is certain that building two separate networks and alternating learning and inference (for the discriminator) by forwarding the output value to the generator is certainly easier. I'm still looking for a way to do that by reusing Dlib's existing primitives but it's not so easy for me. At least, we have a U-Net model that can probably be reused for the generator (https://github.com/davisking/dlib/blob/master/examples/dnn_semantic_segmentation_ex.h)...

Here's my work in process for the DCGAN implementation with two separate networks, but it doesn't work (yet).
It's what I tried first, before merging them both. https://gist.github.com/arrufat/062b8847b7f87465efd96d627dadf1ad

Thanks for sharing, my mate! I leave a comment attached to this code.

Hi @arrufat , I updated the code here: https://gist.github.com/Cydral/92be4e848551429ec1a6919d6d813c08.

I used another approach for the formalization of the G and D networks but at the end, it's very close to your own code. This seems to work overall... except for the back propagation of the loss tensor values. Maybe could @davisking really advise us on that?

By the way, it would only work for a single plane for the moment; I had initially made a version to infer a RGB image but I have a problem to get a 3D matrix formalizing a reconstructed image from the generator outputs. Maybe it is necessary to extract each scalar values of k and rebuild a RGB image from these different planes (I haven't tried yet)?

If you are trying to backprop from one network into another you want to get
the gradients via get_final_data_gradient(), not get_gradient_input(),
since you want the gradients with respect to the inputs to the network,
which is not what get_gradient_input() is giving you.
get_gradient_input() is the input to the backprop procedure for a
network. It's not an output of the network.

On Thu, Jul 25, 2019 at 4:10 AM Cydral notifications@github.com wrote:

Hi @arrufat https://github.com/arrufat , I updated the code here:
https://gist.github.com/Cydral/92be4e848551429ec1a6919d6d813c08.

I used another approach for the formalization of the G and D networks but
at the end, it's very close to your own code. This seems to work overall...
except for the back propagation of the loss tensor values. Maybe could
@davisking https://github.com/davisking really advise us on that?

By the way, it would only work for a single plane for the moment; I had
initially made a version to infer a RGB image but I have a problem to get a
3D matrix formalizing a reconstructed image from the generator outputs.
Maybe it is necessary to extract each scalar values of k and rebuild a RGB
image from these different planes (I haven't tried yet)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/davisking/dlib/issues/1776?email_source=notifications&email_token=ABPYFR23EJ3P6ND4RDMR4ZTQBFNWJA5CNFSM4HOKEY5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2YWVNQ#issuecomment-514943670,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPYFR7EGAFA7ALMYKOBPT3QBFNWJANCNFSM4HOKEY5A
.

Thank you for your advice. We will continue our work on this basis.

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-09-11 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

Sorry to bother, but I've been reading the documentation and trying several things, and when I try to backprop from one net into the other:

dis_trainer.train_one_step(mini_batch_fake_samples, mini_batch_fake_labels);
resizable_tensor loss_fake = discriminator.subnet().get_final_data_gradient();
generator.subnet().back_propagate_error(loss_fake);

I get this:

Error detected at line 662.
Error detected in file _deps/dlib-src/dlib/cuda/cudnn_dlibapi.cpp.
Error detected in function void dlib::cuda::batch_normalize_conv_gradient(double, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, dlib::tensor&, dlib::tensor&, dlib::tensor&).

Failing expression was src.k() == (long)means.size().

I know it's my fault and I am using the API incorrectly, but I can't figure out how to do it properly...

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-10-18 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

After reading the documentation more thoroughly, I think I've found a way to properly backpropagate the loss from one network to the other.

// get the loss after inputting images from the generator using noise samples
const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();
// convert the input noises into a tensor by using the generator input layer (13, in my case)
resizable_tensor noises_tensor;
layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor);
// make a forward call
generator(noises)
generator.subnet().subnet().back_propagate_error(noises_tensor, out_fake);

All this works until the network is serialized:

Error detected at line 1522.
Error detected in file _deps/dlib-src/dlib/cuda/cudnn_dlibapi.cpp.
Error detected in function void dlib::cuda::tanh_gradient(dlib::tensor&, const dlib::tensor&, const dlib::tensor&).

Failing expression was have_same_dimensions(dest,gradient_input) == true && have_same_dimensions(dest,grad) == true.

However, I don't know if there might be a more straightforward way to backpropagate the loss from one network to the other.
Suggestions are welcome :)

back_propagate_error() requires that have_same_dimensions(gradient_input, get_output())==true. Are you sure that out_fake has the same dimensions as generator.subnet().subnet().get_output()?

Thanks for the reply.
I've just verified and they have the same size:

const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();
std::cout << "out_fake: " <<
              out_fake.num_samples() << "x" <<
              out_fake.k() << "x" <<
              out_fake.nr() << "x" <<
              out_fake.nc() << std::endl;

const resizable_tensor& out_gen = generator.subnet().subnet().get_output();
std::cout << "gen_fake:  " <<
              out_gen.num_samples() << "x" <<
              out_gen.k() << "x" <<
              out_gen.nr() << "x" <<
              out_gen.nc() << std::endl;

the output is:

out_fake: 8x1x28x28
out_gen:  8x1x28x28

I will update the code to my github fork soon.

You can find a compilable example here.

You can find a compilable example here.

Thanks for sharing the code – I too am very excited about this new development!

Looks to me like synchronizing the network being trained to disk may empty out_fake = discriminator.subnet().subnet().get_final_data_gradient().

With this small sanity check, was at least able to train much longer: https://github.com/reunanen/dlib/commit/ac6d8e7fad329d5a75c8619b710aa1832b870d92

So, I've realized that I might have been back-propagating the error without updating the network parameters.
I've updated the code to perform these actions:

// convert the noises array to a tensor
resizable_tensor noises_tensor;
generator.to_tensor(noises.begin(), noises.end(), noises_tensor);
// forward it to the network
generator.subnet().forward(noises_tensor);
// back-propagate the error with the loss from the discriminator
generator.subnet().subnet().back_propagate_error(noises_tensor, out_fake);
// update the network parameters using the the discriminator network solvers
auto solvers = gen_trainer.get_solvers();
generator.subnet().subnet().update_parameters(make_sstack<adam>(solvers), gen_trainer.get_learning_rate());

It's not working yet, but I'm not 100% I'm doing it right.
Anyway I feel we're getting closer :)

Oh yeah, that part is important :)

With the latest update, I've managed to speed up the training and generate images like this one:

img_1

But I'm running out of ideas on how to make it work well... right now I can't see what I'm doing wrong...

Unluckily I haven't been able to make this work... I've read the docs several times and I can't see what I am doing wrong.
If somebody has some spare time to look at the code, I would really appreciate it :)

It's very possible it's not a software issue. Take it from someone who has
attempted to reproduce many many papers, it's often hard and often things
don't work the way paper authors suggest. There are often important tricks
they leave out that are needed to make things work right, or the method is
very narrow and only works on the data they used, rather than on more
general stuff, despite implicit suggestions to the contrary in the paper.
You should start out by trying to exactly reproduce some published setup,
so same data and settings and everything, if you can.

On Sun, Nov 10, 2019 at 3:43 AM Adrià Arrufat notifications@github.com
wrote:

Unluckily I haven't been able to make this work... I've read the docs
several times and I can't see what I am doing wrong.
If somebody has some spare time to look at the code, I would really
appreciate it :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/davisking/dlib/issues/1776?email_source=notifications&email_token=ABPYFR7GMJMSHAQ42GKA2NLQS7CTXA5CNFSM4HOKEY5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUYKOA#issuecomment-552174904,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABPYFR23LVDWFSNU5FVPPCTQS7CTXANCNFSM4HOKEY5A
.

Thank you for your suggestion.
I have also some experience implementing papers and many times there are some tricks that the authors missed (on purpose?), so making it work is not usually straightforward.
For this reason I wanted to implement the DCGAN code from the PyTorch CPP examples.
I am using same the same dataset, optimizer, parameters and batch size.
I will debug the PyTorch example more thoroughly to make sure I didn't miss any detail.

However, I would say that I'm not sure I am updating the network appropriately. I did it the way I understood from the documentation and how it's used in dlib itself, but I've never seen it being used to update a network with gradients coming from another one.

I will let you know if I make some progress, so that other people that want to create a GAN using dlib don't have to struggle too much :)

First of all, thank you for your contribution.
there were some fixes.
add leakyRelu,
Added BinaryCrossEntorpyLoss function,
modified DCGan Training Sequence.

    using generator_type =
        loss_binary_cross_entropy < fc_no_bias <1,
        htan<contp<1, 4, 2, 1,
        relu<bn_con<contp<64, 4, 2, 1,
        relu<bn_con<contp<128, 3, 2, 1,
        relu<bn_con<contp<256, 4, 1, 0,
        input<noise_t>
        >>>>>>>>>>>>>;
    using discriminator_type =
        loss_binary_cross_entropy <fc_no_bias<1,
        conp<1, 3, 1, 0,
        leakyrelu<bn_con<conp<256, 4, 2, 1,
        leakyrelu<bn_con<conp<128, 4, 2, 1,
        leakyrelu<conp<64, 4, 2, 1,
        input<matrix<unsigned char>>
        >>>>>>>>>>>;

And if you look at the pytorch c ++ code, you will finally learn the final gradient of the generator in the discriminator.
Rather than learning the final gradient for the Discriminator's training, it's important to learn the gradient for the Test_loss.
this code is

dis_trainer.test_one_step(mini_batch_fake_samples, mini_batch_fake_labels);
resizable_tensor noises_tensor;
layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor);
generator(noises);
const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();

above this is very important.
I'll clean up the source ASAP and put it on my repositorie.

Thank you for your contribution.
Let's continue to develop Dlib together.

this is result of mnist dcgan.
DCGAN

I just updated the dcgan example code in my repository

Please contact me if you there is a problem!

example code is
https://github.com/intellizd/dlib-dcgan_example/blob/master/examples/dnn_dcgan_train_ex.cpp

dcgan-dlib whole source is
https://github.com/intellizd/dlib-dcgan_example

thanks

Awesome. Super cool to see this working :)

Agreed, thank you so much for looking into this @intellizd
Will you submit a PR? I think you should :)

I usually use the dlib platform frequently.
This time I'm glad to contribute because of the dcgan issue.
This time it's a mnist dcgan sample, but I'd like to try various new things through dlib.
I want to be a contributor to various features of dlib.

First
I want to develop mnist GAN to cifar GAN.
Davis, I would appreciate it if you consider me a contributor.

@intellizd that would be awesome :)

You should make the mnist DCGAN thing into a little example so other people can learn from it. That would be super cool and educational for many users.

@davisking ,Thank you for the compliment.

I learned a lot from this DC-GAN issue.
thank you to @arrufat for making this issue.
I recommend that DLIB draw up a development road-map like any other Deep Learning Frameworks.
Why don't we develop that roadmap together with contributors who are willing to participate?

We can make a bunch of issues that are "help wanted". There are already a few. I generally encourage others to work on whatever interests them. But some obvious things to do are to add new layer types like dilated convolution. There are a few things like that that have been added to cuDNN but not yet made part of dlib's layer catalogue. Burning down that list is a good place to start.

@davisking thanks for your comment about dlib's direction.
I'll follow your direction.

I just updated the dcgan example code in my repository

Please contact me if you there is a problem!

example code is
https://github.com/intellizd/dlib-dcgan_example/blob/master/examples/dnn_dcgan_train_ex.cpp

dcgan-dlib whole source is
https://github.com/intellizd/dlib-dcgan_example

thanks

Are you sure of the part of code:
resizable_tensor noises_tensor;
layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor);
generator(noises);
const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();
... because networks are different:
out_fake: 128x1x96x96
noises_tensor: 128x100x1x1
In such a situation, an assert is normally raised:
Failing expression was have_same_dimensions(dest,gradient_input) == true && have_same_dimensions(dest,grad) == true

@Cydral That's right.
Perhaps the generator and discriminator layers should be paired and matched.
such like generator's htan <-> discrimminator's conp

Have you changed layer structure?
If you changed it, show me the structure.

@intellizd, not really. I used the definition to align my own code with fixes you reported previous as below;
`using generator_type = loss_binary_log htan relu relu relu input

; using discriminator_type = loss_binary_log conp<1, 3, 1, 0,
relu relu relu input>
;`
I only set the image size to 96pix.

In a previous comment, you reported:

dis_trainer.test_one_step(mini_batch_fake_samples, mini_batch_fake_labels); resizable_tensor noises_tensor; layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor); generator(noises); const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();

According to the network definition, layer<13> means the entry level in your own code. Thus my remark because the entry level definition of the generator doesn't match the out_fake structure. Thus my question. Could you please check the code you posted to confirm that the right version used to generate images as results of the of the mnist dcgan run? Thanks in advance.

The corresponding layer is the architecture for the minist data.
Only size for images of 28*28 pixels.
You can change the image size and try again

Sorry, I didn't get the point. Why will this work with a 28x28pix image because, except if I'm mistaken, we will try to use the final data gradient of a num_samplesx100x1x1 (z latent layer) network to backpropagate it inside a num_samplesx1ximage_sizeximage_size? BTW, I tested also directly with your own code and I have an interruption during the second loop, coming from the discriminator training... I'm trying to figure out why

Sorry, you can forget my question. This is clear but I had not realized that the latent vector finally gave input_size=1. The output size is therefore independent of the size of the input vector in this case.

Warning: this issue has been inactive for 35 days and will be automatically closed on 2020-01-20 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

@intellizd are you planning on submitting a PR with an example on how to train these kind of networks? If not and if you don't mind, I might give it a go :)

@arrufat mnist and cifar10 dcgan were completed with CUDA version.
I'm the first PR at DLIB. BINARY_CROSS_ENTROPY was added to the CUDA version at Dlib's Loss Fuction.
I'll try as soon as possible if I can get PR. I was busy these days so I didn't get busy these days.
Thanks for your advice.

following

@arrufat
Dlib-DCGAN Repository:
https://github.com/intellizd/dlib-dcgan_example
just now updated dcgan examples (MNIST, CIFAR 64x64)
CUDA version binary cross entropy and 64*64 size image DC-GANs sample was added.
Because I'm a novice at PR.
I hope you will be the leader and this PR(DCGAN-examples)
Optimize and PR please.

@intellizd Oh great, thanks for contributing to this! I will definitely have a look :)
However, it seems that instead of forking dlib's repository, you copied the dlib tree into a new repository.
I'll try to fix that :)

@arrufat To do PR Maybe I should do FORKED.
I didn't know this. Thank you for letting me know. I'll try.

Warning: this issue has been inactive for 35 days and will be automatically closed on 2020-03-20 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

Warning: this issue has been inactive for 42 days and will be automatically closed on 2020-03-20 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

@intellizd any updates on this? If you want I can try to make a PR or break the contribution into several ones, since you added leaky_relu, an other stuff to make it work. I was wondering if we actually need leaky_relu or we can just set the right slope and the learning_rate_multiplier for the prelu activation to 0.

@arrufat LeakyRelu is explained in original paper DCGAN so I implemented it.
Current version dlib provides prelu, but it doesn't look much different from LeakyRelu. Just the Slope and the initial value?
However, it would be nice to have several activation functions in the DLIB.
Verification has been made and there seems to be no problem.
CUDA would also be implemented.
I want you to work together and PR together.
where is your forked dlib repository? I will follow your repos.

@intellizd, my fork is here: https://github.com/arrufat/dlib

I will create a leaky_relu branch and start working on that PR first, since it will be easier to review.

@arrufat I was created a binary_cross_entropy_loss function branch and I will try to PR this.

@intellizd, I was wondering if that is necessary, I think that loss_multiclass_log is the binary cross entropy loss function when you set the number of classes to 2.

Yes, the log losses are cross entropy losses. The more canonical name for these things is log or logistic loss. The deep learning community had a bunch of weird non-textbook names for things. Sadly they have now become common which makes things confusing for outsiders.

So to be clear, multinomial logistic regression is the same thing as the cross entropy method.

Thank you for your good advice.
You are right. I know it's the same function. However, it will also be necessary to enhance convenience by making functions more recognizable to other users.

When I implemented the DCGANs example as a dlib, I analyzed the core part of the pytorch C++.

Pytorch used binary_cross_entropy, output range is 0 to 1 and suitable for use with the sigmoid activation function.

First I tryed to implement Dlib binary log loss, I used the DCGANs example during the process, but it has a branching between 1 and -1, so I wrote the binary cross entropy that output is 0 to 1.

These problems need to be integrated by someone with a high level of insight.

@intellizd Oh, I think I got your point, loss_multiclass_log with 2 classes has 2 outputs, but for this case we need one output only that varies between 0 and 1, but the loss_binary_log varies between -1 and 1 by default.
Maybe we can try:

  • replacing the sig with htan in the discriminator, so that the output range is between -1 and 1.
  • try to make it work with loss_multiclass_log.

I have one suggestion for @davisking: Instead of adding a new loss layer, would you accept a PR that:

  • allows modifying the output values (negative_label and postive_label) of the loss_binary_log, defaulting to -1.f and 1.f, but we could set them just like we can set the threshold for loss_metric
  • when deserializing a previous version loss_binary_log, initializes negative_label and positive_label to -1.f and 1., respectively, to keep backwards compatibility

So, one should be able to write:

using net_type = loss_binary_log<...>;
net_type net(loss_binary_log_(0.f, 1.f);

And then, here, instead of checking the ground truth is not 0, we check that is not the midpoint between positive_label and negative_label?

I would not accept such a PR, as it's not right. The output of loss_binary_log is not between -1 and 1, it's between -infinity and infinity, because it is a log odds ratio. That is, it is log(probability_true_class/probability_false_class). If you want to get the probability_true_class probability out of it just convert to that. You do this by applying 1/(1 + exp(-logodds)), which with some algebra you can see yields probability_true_class. I.e. you apply a sigmoid.

Oh, I completely missed the point, sorry about that 😅.

No worries :)

I have managed to make it work, you can try the working version here.

I will add more comments and make it look more like an example and submit a PR. Thank you all for your help.

This is what snapshots every 5000 training iterations look like:

tiled_image

@arrufat Congratulations. This issue you created a year ago is coming to an end. soon we can see the dcgan examples in the dlib. I got your back.

Let me explain what I was missing:

  • changed prelu to leaky_relu
  • change sig to htan since loss_binary_log chooses between positive and negative numbers
  • get the gradient from the discriminator with fake images but true labels: this tells the generator how it should update itself.

With regards to the last point, I missed it because in the PyTorch example, they do this:

// Train discriminator with fake images.
torch::Tensor fake_labels = torch::zeros(batch.data.size(0), device);
torch::Tensor d_loss_fake =
          torch::binary_cross_entropy(fake_output, fake_labels);

// Train generator.
fake_labels.fill_(1);
torch::Tensor g_loss =
          torch::binary_cross_entropy(fake_output, fake_labels);

But I missed the fact that they change the value of fake_labels, instead of using real_labels.
It sounds stupid, but I skipped that fake_labels.fill_(1); statement, so I was training with fake_labels in the dlib code instead of real_labels, which wasn't giving any constructive feedback to the generator...

Also, @intellizd, I tried using test_one_step, like you do, but it didn't work for me, since that call does not populate the get_final_data_gradient().
I needed to call back_propagate_error and then it worked. So I don't understand how you made it work in your code. Or maybe I am missing something else?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maromcik picture maromcik  Â·  4Comments

pliablepixels picture pliablepixels  Â·  4Comments

ardamavi picture ardamavi  Â·  3Comments

joeking11829 picture joeking11829  Â·  5Comments

reunanen picture reunanen  Â·  3Comments