I just use CPU_ONLY mode ,but the syncedmenm.hpp has the problems,when i use imagenet network. however the cifar10 network is OK. can anybody help me. thanks.
The wrong message:
I0516 15:16:38.358605 11079 net.cpp:217] Network initialization done.
I0516 15:16:38.358611 11079 net.cpp:218] Memory required for data: 343008808
I0516 15:16:38.358701 11079 solver.cpp:42] Solver scaffolding done.
I0516 15:16:38.358732 11079 solver.cpp:222] Solving CaffeNet
I0516 15:16:38.358738 11079 solver.cpp:223] Learning Rate Policy: step
I0516 15:16:38.358748 11079 solver.cpp:266] Iteration 0, Testing net (#0)
I0516 15:51:24.069187 11079 solver.cpp:315] Test net output #0: accuracy = 0.5002
I0516 15:51:24.069303 11079 solver.cpp:315] Test net output #1: loss = 0.983944 (* 1 = 0.983944 loss)
F0516 15:51:24.069330 11079 syncedmem.hpp:19] Check failed: _ptr host allocation of size 158297088 failed
_* Check failure stack trace: **
@ 0xb74d2efc (unknown)
@ 0xb74d2e13 (unknown)
@ 0xb74d285f (unknown)
@ 0xb74d58b0 (unknown)
@ 0xb75f0609 caffe::SyncedMemory::mutable_cpu_data()
@ 0xb76a8752 caffe::Blob<>::mutable_cpu_data()
@ 0xb76633a1 caffe::BasePrefetchingDataLayer<>::Forward_cpu()
@ 0xb7697918 caffe::Net<>::ForwardFromTo()
@ 0xb7697bcd caffe::Net<>::ForwardPrefilled()
@ 0xb7697c7b caffe::Net<>::Forward()
@ 0xb76e3f5e caffe::Solver<>::Step()
@ 0xb76e48dd caffe::Solver<>::Solve()
@ 0x804d970 train()
@ 0x804bbd9 main
@ 0xb7209a83 (unknown)
@ 0x804c238 (unknown)
Aborted (core dumped)
I get the same issue. This is a modified (read: smaller) version of the bvlc_reference_caffenet model. I am loading up 9 training HDF5 files and 1 test with the following dimensions:
foodnet_test_00000001.h5
data Dataset {11610, 3, 227, 227}
label Dataset {11610, 4}
foodnet_train_00000001.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000002.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000003.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000004.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000005.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000006.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000007.h5
data Dataset {13500, 3, 227, 227}
label Dataset {13500, 4}
foodnet_train_00000008.h5
data Dataset {9990, 3, 227, 227}
label Dataset {9990, 4}
I've also tried running with the patch from #2473, however the problem persists. Note that this only happens if I enable the test phase. Once I disable the test phase things go smoothly!.
thank for your answer. I've reduced the training dataset,but the problems also exit. I have no ideas about this,wheter it related to the framework of net.@mynameisfiber.
and how to disable the test phase? thanks!
@zkl123 I commented out the lines:
#test_iter: 1000
#test_interval: 1000
In the solver.prototxt
+1 I am using CPU only.
this problem may be related to your pc system,I change to 64 bit system ,this problem solved!
Closing as this looks like a usage issue/platform problem with memory size.
Please ask usage questions on the caffe-users list.
Thanks!
anyone knows this type of problem?
2015-08-25 16:34:22 [20150825-163104-971b] [ERROR] Train Caffe Model: Check failed: *ptr host allocation of size 116160000 failed
2015-08-25 16:34:28 [20150825-163104-971b] [ERROR] Train Caffe Model task failed with error code -6
Job Status Error
Initialized at 04:31:04 PM (1 second)
Running at 04:31:05 PM (3 minutes, 25 seconds)
Error at 04:34:31 PM
(Total - 3 minutes, 26 seconds)
Train Caffe Model Error
Initialized at 04:31:04 PM (1 second)
Running at 04:31:06 PM (3 minutes, 23 seconds)
Error at 04:34:29 PM
(Total - 3 minutes, 24 seconds)
ERROR: Check failed: *ptr host allocation of size 116160000 failed
conv2 needs backward computation.
pool1 needs backward computation.
norm1 needs backward computation.
relu1 needs backward computation.
conv1 needs backward computation.
label_data_1_split does not need backward computation.
data does not need backward computation.
This network produces output accuracy
This network produces output loss
Collecting Learning Rate and Weight Decay.
Network initialization done.
Memory required for data: 831529208
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.34
Test net output #1: loss = 1.09667 (* 1 = 1.09667 loss)
Check failed: *ptr host allocation of size 116160000 failed
I solved this problem by increasing swap area
http://askubuntu.com/questions/178712/how-to-increase-swap-space
i have also met the same problem in testing, have you solved it?
I had the same issue (64bit, Ubuntu 14.04): "... syncedmem.hpp: xx] Check failed: ptr host allocation of size xxx failed ...".
I could not solve it by 1) increasing swap area, nor 2) reduction of the size of the training dataset.
Does anyone have any advice on this problem?
I'v got same error (Check failed: *ptr host allocation of size -1655996416 failed).
The one of features of HDF5 format is ability to use large datasets directly from HDD without need to load all data into memory. I'd like to use same dataset with different frameworks and currently hdf5 is mostly supported format. For example same dataset used with keras on even more big network takes only ~1.5G of mem. It's very strange advice - shrink dataset or increase memory since it is obviously bug in caffe.
I have the same problem....
@darren1231 I solved it by decreasing the batchsize of both train and test.
@inferrna Thanks. I solved it by increasing memory. But I had the same issue again with increased data. I finally used Keras for the data and did the calculation without problems.
Most helpful comment
@inferrna Thanks. I solved it by increasing memory. But I had the same issue again with increased data. I finally used Keras for the data and did the calculation without problems.