As of #288, semantic segmentation is doable using dlib. However, I suppose that having the information of the early layers directly available at the later layers should improve accuracy. I'm thinking of essentially DenseNet, or at least something similar.
Apparently some implementation exists already, but it's not public. To me it sounds like massive waste of time if we all implement our own thing, so would love the idea of (again) having a public(-domain) example, e.g. on top of #943.
@davisking, you indicated here that you'd be adding some implementation. Are you still planning to do this? If yes, then I'd prefer to wait for your implementation. If not, then I might try and see if I can contribute something myself, using the building blocks that dlib already offers.
I never ended up making a densenet layer. I refactored some of dlib's code and the interface of the tensor_conv object so that it was possible to implement it though. So the remaining steps should be relatively easy (but still tedious since testing these kinds of things is always tedious).
But in any case, I think you should be able to make a layer that contains multiple tensor_conv objects.
I take it then that I can't use the concat layer to build a densenet?
Not efficiently. You should write a custom layer that does it.
Warning: this issue has been inactive for 131 days and will be automatically closed on 2018-09-07 if there is no further activity.
If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.
Still relevant. Leaving this comment, so that the issue will not be automatically closed.
Indeed. I'm adding a tag that will prevent the issue bot from closing it.
Hi, just wanted to express my interest in this, has anything been implemented?
I am looking to use dlib to try to reproduce the u-net results.

If nothing has been worked on, I can try to take a look into it (I am new to the codebase, so will take me a while).
You should be able to define a new layer that represents a dense block.
You can probably do it using the tensor_conv object and not need to talk
directly do cudnn or anything like that.
@goldbattle I have implemented U-net, and already for a while I've been wanting to upgrade the semantic-segmentation example (see #943) to use it (even though the accuracy benefit may not be incredibly high). So stay tuned. (This got delayed because I wanted to try DenseNet also, but maybe I'll leave that to a separate PR, and try to expedite a U-net PR.)
@davisking For inference of arbitrary images, I found it necessary to add a new resize_to_prev layer which, well, resizes a tensor to have the size of an earlier layer. Otherwise I had an issue when first downscaling and then upscaling – the sizes simply did not match.
Just to clarify, I meant that using U-net might not improve the results of the PASCAL VOC 2012 dataset (used in #943) a lot. But on certain other data that I've tried it on (not public), it's pretty good (compared to something that doesn't have such skip connections, that is).
Makes sense, resize_to_prev sounds useful :)
Not efficiently. You should write a custom layer that does it.
Is this still relevant? I have implemented all the DenseNet architectures from the paper and ran a small benchmark against the official PyTorch implementation of DenseNet121 with a 1x3x224x224 input tensor:
| | inference | fps | #params | VRAM |
|:-------:|-----------|--------|-----------|---------|
| dlib | 12.282 ms | 81.417 | 7,897,960 | 534 MiB |
| PyTorch | 23.246 ms | 43.018 | 7,978,856 | 669 MiB |
As you can see, dlib implementation is almost as twice as fast as PyTorch and uses about 20% less RAM.
I know this is a simple test, but still, how much can we expect by implementing a custom layer?
_PS: Note how compact dlib is to define these networks. It's so satisfying..._
Great work @arrufat! 💪
From my point of view, your example implementation in dlib-users/dnn meets the definition-of-done criteria for this issue (although I didn't try it out just yet), so I'm closing this ticket now.
Indeed feature requests like this one could in the future go there instead?
Whoever needs a custom layer (for performance or other reasons) can create a new ticket. :)
Yeah, that’s a good way to do it.
Most helpful comment
@goldbattle I have implemented U-net, and already for a while I've been wanting to upgrade the semantic-segmentation example (see #943) to use it (even though the accuracy benefit may not be incredibly high). So stay tuned. (This got delayed because I wanted to try DenseNet also, but maybe I'll leave that to a separate PR, and try to expedite a U-net PR.)
@davisking For inference of arbitrary images, I found it necessary to add a new
resize_to_prevlayer which, well, resizes a tensor to have the size of an earlier layer. Otherwise I had an issue when first downscaling and then upscaling – the sizes simply did not match.