Keras: Detailed explanation of model.fit_generator() parameters: queue size, workers and use_multiprocessing

Created on 19 Nov 2017 · 16Comments · Source: keras-team/keras

I couldn't find detailed explanations about max_queue_size (default size = 10) and the mechanism behind it along with other parameters: workers, use_multiprocessing which are related with generator.

_Question 1:_
I might be totally wrong and want to get your feedback on my understanding of it. I thought that multiple generator instances (like producer) will be launched to feed the data into a queue what was create/maintained by model.fit_generator() function, meanwhile data will be grabbed from queue into GPU for training(consumer). If training with GPU is not bottleneck, then the more data could be yield/produced by generator, the faster the overall process would be. I learned by default the max_queue_size = 10, how to define the proper max_queue_size once the generator is thread-safe?

_Question 2:_
Also, is there a way to measure weather the bottleneck is generator(producer) or GPU training(consumer)? I use verbose = 1 to print the status bar, as well as how many rows a single thread generator yield. Right now it always like:

number of rows yield = (max_queue_size + number of steps has been processed) * batch_size

So it seems like the queue is alway full which means the bottleneck is consumer/GPU training?

_Question 3_
the Accuracy within Epoch 1 improves almost every steps from 0.6719 to 0.87316. However, within Epoch 2, after the half steps completed, the accuracy started declines from 0.9551 to 0.8317. The trend continued in Epoch 3 and now the accuracy is 0.6633.

The input is 3D array contains number of rows * 1000 timestamp * 2400 length of word2vec's vectors. The word2vec model is trained from scratch contains 200k vocabulary.
I only use one LSTM layer followed by Dropout layer then the output. The code and layer structure is like:

data = pd.read_csv("data.csv", header=0, delimiter="\t", quoting=3, encoding="utf-8")
y = data.label
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.2)

def data_genereator(data, batch_size):
        num_rows = int(data.shape[0])
        # Initialize a counter
        counter = 0
        while True:
            for content, label in zip(data['content'], data['label']):
                X_train[counter%batch_size] = transform(content)
                y_train[counter%batch_size] = np.asarray(label)
                counter = counter + 1
                if(counter%batch_size == 0):
                    yield X_train, y_train

model = Sequential()
model.add(LSTM(64, input_shape=(1000, 2400), return_sequences=False, kernel_initializer='he_normal', dropout=0.15, recurrent_dropout=0.15, implementation=2))
model.add(Dropout(0.3))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
training_generator = data_genereator(X_train, batch_size=512)
validation_generator = data_genereator(X_test, batch_size=512)

 model.fit_generator(training_generator,
                       steps_per_epoch=8856,
                       validation_data=validation_generator, 
                       epochs=10, 
                       verbose=1,
                       workers=1,
                       use_multiprocessing=False,
                       validation_steps=2216)

Epoch 1/10
1/8856 [..............................] - ETA: 87:38:53 - loss: 0.6919 - acc: 0.6719 yield at counter  6144
...
8856/8856 [==============================] - 63350s 7s/step - loss: 0.3740 - acc: 0.8316 - val_loss: 0.1098 - val_acc: 0.9590


Epoch 2/10
1/8856 [..............................] - ETA: 13:53:45 - loss: 0.1892 - acc: 0.9297 yield at counter  4540180
...
4937/8856 [===============>..............] - ETA: 6:20:03 - loss: 0.1261 - acc: 0.9551 yield at counter  7067412
...
8856/8856 [==============================] - 60512s 7s/step - loss: 0.3364 - acc: 0.8317 - val_loss: 0.5985 - val_acc: 0.6876

Epoch 3/10
1/8856 [..............................] - ETA: 14:45:26 - loss: 0.6098 - acc: 0.6816 yield at counter  9074728
...
4681/8856 [==============>...............] - ETA: 6:45:02 - loss: 0.6130 - acc: 0.6633 yield at counter  11470888

Source

xiaolezhu

👍10

Most helpful comment

Hey @CMCDragonkai and @Dref360, I am new to DL and currently using Keras for building my first few models, I am still confused as to how do you use queue size, workers and use_multiprocessing. Can you please give me an example of how would you use them if you had 2xGPU(V100/P100) and a 8 core CPU? Or is it better to keep default values?

Thanks!!

ulfimlg on 13 Sep 2018

👍6

All 16 comments

ghost on 27 Dec 2017

PeterPanUnderhill on 30 Jan 2018

I am also a bit more interested in Question 3, had the same experience with one of the model training!

Question 1

P.S: I found this answer for queue size which totally make sense!
https://stackoverflow.com/a/36989864/4496896

spate141 on 18 Feb 2018

👍1

up +1

Neabfi on 12 Jun 2018

2 - max_queue_size is just multi process queue

https://github.com/keras-team/keras/blob/58fd1f0589d33aeb33c4129cfedfb7737495efc0/keras/engine/training_generator.py#L137

rspadim on 26 Jun 2018

The queue size can be sized according to Little's Law.

CMCDragonkai on 3 Jul 2018

ranjeetthakur on 3 Jul 2018

Hey, I'm the one that built the V2 of this stuff so I'll try to answer it.
I'm starting a Wiki page for this, it's pretty basic so feel free propose changes:
https://github.com/keras-team/keras/wiki/Understanding-parallelism-in-Keras

Important When using use_multiprocessing=False, you're blocked by the GIL 90% of the time.

Also Prefers keras.utils.Sequence.

Question 1: As @CMCDragonkai said, the Little's Law is a good choice, but 2 times the number of workers is a great start.

Question 2: I would propose to look at your GPU usage, if it's at 100% consistently, the GPU is the bottleneck

Question 3: Your generator is weird, it manipulates X_train, y_train.
Pretty sure it's not doing what you want.

Dref360 on 3 Jul 2018

👍2

I've noticed that keras workers parameter is bit strange. If you set it to 1. And you enable multiprocessing, there are 2 child processes created. One of the child processes does absolutely nothing. It has 0.0% CPU usage, and stracing it shows that it's stuck on a futex syscall.

So when I gave it workers = 12, it produced 24 child processes.

The extra worker process seems useless. What is it for? The docs only say that worker processes are used to generate "input" for the neural network. But the redundant child processes do absolutely nothing.

CMCDragonkai on 25 Jul 2018

Are you sure those workers are not just the validation workers? During training, the worker used for validation will be idle.
Also, are you sure that they are processes and not threads?
Depending on the size of the queue, there are multiple threads per process.
You can see the actual processes with :
multiprocessing.active_children()

Dref360 on 25 Jul 2018

Those workers are definitely processes. I used htop to switch between showing user threads and not showing them. In this case I had it on not showing them.

CMCDragonkai on 12 Sep 2018

Thanks!!

ulfimlg on 13 Sep 2018

👍6

@Dref360

Important When using use_multiprocessing=False, you're blocked by the GIL 90% of the time.

So why use_multiprocessing=False by default?

mrgloom on 7 Aug 2019

👍1

Using multiprocessing is an advanced usage and brings some complications. You need to care about process-safe resources, Windows Fork-and-Exec, etc. So using threads is better from a UX pov.

Dref360 on 7 Aug 2019

@Dref360
What can be the possible reason of that with use_multiprocessing=False I have lower CPU load than with use_multiprocessing=True using same number of workers?

https://stackoverflow.com/questions/57381464/keras-using-multiply-cores-in-batch-generator

mrgloom on 7 Aug 2019

👍1

Because when set to False, we use threads. There are no real threads in Python so you won't see much improvement.

Dref360 on 7 Aug 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings