Darknet: Request: Documentation for LSTM

Created on 28 Nov 2019 · 5Comments · Source: AlexeyAB/darknet

Hello,
I would like to ask if you could write something on how to use yolo with lstm on sequential data?

It seems yolo has lstm support:
https://github.com/AlexeyAB/darknet/blob/master/cfg/lstm.train.cfg

Is the lstm config only for training(adjusting the weights) and the predictions are done with a standard yolo config?

Thank you very much + Greetings,
Holger

Source

holger-prause

Most helpful comment

But the frames from video 2 are not sequential to frames from video1 but they are listed in the same train.txt. How is this supposed to work - do i need to insert a stop word or do i need to have two train.txt and resume training?

No. Rare context changes are possible.

If an object is present in one frame and occluded from another object in the next frame, i should still label it?

Yes. If you want to detect occluded objects.

Hmm so the "border" between last frame of video1 and first frame of video2 can be ignored and does not have to be treated special(in separate train files for example)?

Yes.

And the labeling of hidden/occlluded object makes only sense if they appear again on later frames i guess?

It makes sense if you want to detect occluded objects.

AlexeyAB on 28 Nov 2019

🎉1 👍1

All 5 comments

Use these models: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494148968

How to train LSTM: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494154586

Use the same cfg-file for Training and Prediction.

https://github.com/AlexeyAB/darknet/blob/master/cfg/lstm.train.cfg this is for the text generation: https://pjreddie.com/darknet/rnns-in-darknet/

AlexeyAB on 28 Nov 2019

👍1

Sorry i need to ask again - this is not clear to me.
My current understanding :

https://github.com/AlexeyAB/darknet/blob/master/cfg/lstm.train.cfg is for generating texts
That's why there are no width / height parameters (which confused me).
https://github.com/AlexeyAB/darknet/files/3199770/yolo_v3_tiny_lstm.cfg.txt
This is a (yolov3 tiny) lstm config i can use for training and prediction (like usually - nothing special here)
In the train.txt. the images(grabbed frames) must be in the same order as they were in the video (well its called sequential for some reason - right ^^)
If an object is present in one frame and occluded from another object in the next frame, i should still label it?

Until here its clear i think.
What confusing me most is this statement from you:

train.txt - first 80% of frames (80% from video1 + 80% from video 2, if you use frames from 2 videos)

So in train.txt i list 80% of the frames from video 1
This is followed by 80% of the frames from video 2
But the frames from video 2 are not sequential to frames from video1 but they are listed in the same train.txt. How is this supposed to work - do i need to insert a stop word or do i need to have two train.txt and resume training?

Or do i just think too complicated again and its just fine?

Thank you very much again,
Greetings, Holger

holger-prause on 28 Nov 2019

Fine.

isgursoy on 28 Nov 2019

🎉1

Hmm so the "border" between last frame of video1 and first frame of video2 can be ignored and does not have to be treated special(in separate train files for example)?
I am just a bit paranoid as i failed a lot during my last training attempts and want to prevent stuff.

On the other hand if video1 end with a black frame(usually it does) and video2 starts with a black frame its even perfect, so nothing can go wrong there guess.