The .solverstate files saved by caffe use relative pathnames to find the relevant parameter file. This means that resuming from a solverstate in the wrong directory can cause a crash, or worse (if there are parameter files in the current directory), silently use unexpected parameters.
We could:
We could:
@longjon's 3 has my vote since it is reasonable to bundle a solverstate and its related params in the same directory and this option is simple to implement and document.
Whenever possible, we should fail early and fail often to avoid wasting the users precious time to run the incorrect networks. UUID signature helps avoid accidentally loading the wrong parameter files.
I vote for 3, there cannot be two parameter files with same name in the same directory.
Sent from my iPhone
On Mar 21, 2014, at 4:39 PM, longjon [email protected] wrote:
The .solverstate files saved by caffe use relative pathnames to find the relevant parameter file. This means that resuming from a solverstate in the wrong directory can cause a crash, or worse (if there are parameter files in the current directory), silently use unexpected parameters.
We could:
Ignore and document this issue
Use absolute pathnames in solverstate files (but then moving them around becomes difficult)
Look for parameters in the same directory as the solverstate file
Embed parameters in the solverstate files (or vice versa) (but this couples things that might not want to be coupled)
—
Reply to this email directly or view it on GitHub.
The parameter snapshot file name is not informative enough. The users may mistakenly copy and paste irrelevant files of different experiments and override the ones that took hours or even days to train.
NetParameter net_param;
// For intermediate results, we will also dump the gradient values.
net_->ToProto(&net_param, param_.snapshot_diff());
string filename(param_.snapshot_prefix());
const int kBufferSize = 20;
char iter_str_buffer[kBufferSize];
snprintf(iter_str_buffer, kBufferSize, "_iter_%d", iter_);
filename += iter_str_buffer;
LOG(INFO) << "Snapshotting to " << filename;
WriteProtoToBinaryFile(net_param, filename.c_str());
Sorry for opening this issue again, but what do you mean by relative path? From what I understood in the first place, if let's say solverstate and caffemodel are in the same folder, then the path of the caffemodel would be recorded as './caffemodel' by the solverstate. However, when I move both files to a different folder I get a 'caffemodel not found' error, at least using Python API. Is this a bug, or when one says relative path, he means something else?
I have the same problem as @VasLem, when I move both files to a different folder, and then restore the solverstate, caffe complains about not being able to find the caffemodel in its original location. Unless I'm missing something, this doesn't seem like a relative path. Or do most people not run into this issue?
+1, If I move both .solverstate and .caffemodel to different directory and try to resume training, caffe cannot find the .caffemodel file.
+1
Most helpful comment
Sorry for opening this issue again, but what do you mean by relative path? From what I understood in the first place, if let's say solverstate and caffemodel are in the same folder, then the path of the caffemodel would be recorded as './caffemodel' by the solverstate. However, when I move both files to a different folder I get a 'caffemodel not found' error, at least using Python API. Is this a bug, or when one says relative path, he means something else?