Caffe: syncedmem.hpp: Check failed: *ptr host allocation of size 158297088 failed

Created on 16 May 2015 · 15Comments · Source: BVLC/caffe

I just use CPU_ONLY mode ,but the syncedmenm.hpp has the problems,when i use imagenet network. however the cifar10 network is OK. can anybody help me. thanks.
The wrong message:
I0516 15:16:38.358605 11079 net.cpp:217] Network initialization done.
I0516 15:16:38.358611 11079 net.cpp:218] Memory required for data: 343008808
I0516 15:16:38.358701 11079 solver.cpp:42] Solver scaffolding done.
I0516 15:16:38.358732 11079 solver.cpp:222] Solving CaffeNet
I0516 15:16:38.358738 11079 solver.cpp:223] Learning Rate Policy: step
I0516 15:16:38.358748 11079 solver.cpp:266] Iteration 0, Testing net (#0)
I0516 15:51:24.069187 11079 solver.cpp:315] Test net output #0: accuracy = 0.5002
I0516 15:51:24.069303 11079 solver.cpp:315] Test net output #1: loss = 0.983944 (* 1 = 0.983944 loss)
F0516 15:51:24.069330 11079 syncedmem.hpp:19] Check failed: _ptr host allocation of size 158297088 failed
_* Check failure stack trace: **
@ 0xb74d2efc (unknown)
@ 0xb74d2e13 (unknown)
@ 0xb74d285f (unknown)
@ 0xb74d58b0 (unknown)
@ 0xb75f0609 caffe::SyncedMemory::mutable_cpu_data()
@ 0xb76a8752 caffe::Blob<>::mutable_cpu_data()
@ 0xb76633a1 caffe::BasePrefetchingDataLayer<>::Forward_cpu()
@ 0xb7697918 caffe::Net<>::ForwardFromTo()
@ 0xb7697bcd caffe::Net<>::ForwardPrefilled()
@ 0xb7697c7b caffe::Net<>::Forward()
@ 0xb76e3f5e caffe::Solver<>::Step()
@ 0xb76e48dd caffe::Solver<>::Solve()
@ 0x804d970 train()
@ 0x804bbd9 main
@ 0xb7209a83 (unknown)
@ 0x804c238 (unknown)
Aborted (core dumped)

Source

zkl123

Most helpful comment

@inferrna Thanks. I solved it by increasing memory. But I had the same issue again with increased data. I finally used Keras for the data and did the calculation without problems.

rkakamilan on 29 Jun 2016

👍2

All 15 comments

I get the same issue. This is a modified (read: smaller) version of the bvlc_reference_caffenet model. I am loading up 9 training HDF5 files and 1 test with the following dimensions:

foodnet_test_00000001.h5
    data                     Dataset {11610, 3, 227, 227}
    label                    Dataset {11610, 4}
foodnet_train_00000001.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000002.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000003.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000004.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000005.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000006.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000007.h5
    data                     Dataset {13500, 3, 227, 227}
    label                    Dataset {13500, 4}
foodnet_train_00000008.h5
    data                     Dataset {9990, 3, 227, 227}
    label                    Dataset {9990, 4}

I've also tried running with the patch from #2473, however the problem persists. Note that this only happens if I enable the test phase. Once I disable the test phase things go smoothly!.

mynameisfiber on 17 May 2015

thank for your answer. I've reduced the training dataset,but the problems also exit. I have no ideas about this,wheter it related to the framework of net.@mynameisfiber.

zkl123 on 17 May 2015

and how to disable the test phase? thanks!

zkl123 on 17 May 2015

@zkl123 I commented out the lines:

#test_iter: 1000 
#test_interval: 1000

In the solver.prototxt

mynameisfiber on 17 May 2015

+1 I am using CPU only.

SebastianBoyd on 3 Jul 2015

this problem may be related to your pc system,I change to 64 bit system ,this problem solved!

zkl123 on 1 Aug 2015

Closing as this looks like a usage issue/platform problem with memory size.

Please ask usage questions on the caffe-users list.

Thanks!

shelhamer on 9 Aug 2015

anyone knows this type of problem?

2015-08-25 16:34:22 [20150825-163104-971b] [ERROR] Train Caffe Model: Check failed: *ptr host allocation of size 116160000 failed
2015-08-25 16:34:28 [20150825-163104-971b] [ERROR] Train Caffe Model task failed with error code -6

Job Status Error

Initialized at 04:31:04 PM (1 second)
Running at 04:31:05 PM (3 minutes, 25 seconds)
Error at 04:34:31 PM
(Total - 3 minutes, 26 seconds)

Train Caffe Model Error

Initialized at 04:31:04 PM (1 second)
Running at 04:31:06 PM (3 minutes, 23 seconds)
Error at 04:34:29 PM
(Total - 3 minutes, 24 seconds)

ERROR: Check failed: *ptr host allocation of size 116160000 failed

conv2 needs backward computation.
pool1 needs backward computation.
norm1 needs backward computation.
relu1 needs backward computation.
conv1 needs backward computation.
label_data_1_split does not need backward computation.
data does not need backward computation.
This network produces output accuracy
This network produces output loss
Collecting Learning Rate and Weight Decay.
Network initialization done.
Memory required for data: 831529208
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.34
Test net output #1: loss = 1.09667 (* 1 = 1.09667 loss)
Check failed: *ptr host allocation of size 116160000 failed

cervantes-loves-ai on 25 Aug 2015

I solved this problem by increasing swap area
http://askubuntu.com/questions/178712/how-to-increase-swap-space

MahmoudElkhateeb on 23 Jan 2016

👍1

i have also met the same problem in testing, have you solved it?

zhangxinyu-xyz on 20 Apr 2016

I had the same issue (64bit, Ubuntu 14.04): "... syncedmem.hpp: xx] Check failed: ptr host allocation of size xxx failed ...".

I could not solve it by 1) increasing swap area, nor 2) reduction of the size of the training dataset.

Does anyone have any advice on this problem?

rkakamilan on 21 May 2016

👍2

I'v got same error (Check failed: *ptr host allocation of size -1655996416 failed).
The one of features of HDF5 format is ability to use large datasets directly from HDD without need to load all data into memory. I'd like to use same dataset with different frameworks and currently hdf5 is mostly supported format. For example same dataset used with keras on even more big network takes only ~1.5G of mem. It's very strange advice - shrink dataset or increase memory since it is obviously bug in caffe.

inferrna on 21 Jun 2016

👍1

I have the same problem....

darren1231 on 27 Jun 2016

@darren1231 I solved it by decreasing the batchsize of both train and test.

zhangxinyu-xyz on 28 Jun 2016

@inferrna Thanks. I solved it by increasing memory. But I had the same issue again with increased data. I finally used Keras for the data and did the calculation without problems.

rkakamilan on 29 Jun 2016

👍2

Was this page helpful?

0 / 5 - 0 ratings