Caffe: solverstate depends on the current directory

Created on 22 Mar 2014 · 9Comments · Source: BVLC/caffe

The .solverstate files saved by caffe use relative pathnames to find the relevant parameter file. This means that resuming from a solverstate in the wrong directory can cause a crash, or worse (if there are parameter files in the current directory), silently use unexpected parameters.

We could:

Ignore and document this issue
Use absolute pathnames in solverstate files (but then moving them around becomes difficult)
Look for parameters in the same directory as the solverstate file
Embed parameters in the solverstate files (or vice versa) (but this couples things that might not want to be coupled)

documentation

Source

longjon

Most helpful comment

Sorry for opening this issue again, but what do you mean by relative path? From what I understood in the first place, if let's say solverstate and caffemodel are in the same folder, then the path of the caffemodel would be recorded as './caffemodel' by the solverstate. However, when I move both files to a different folder I get a 'caffemodel not found' error, at least using Python API. Is this a bug, or when one says relative path, he means something else?

VasLem on 17 May 2017

👍2

All 9 comments

We could:

Document the appropriate usage but not ignore the issue.
Create a SnapshotManager to take the responsibility of saving and loading the solverstate and parameter files with more flexible policies.
To avoid loading the incorrect parameter files, use UUID signatures, a little more expensive md5 or something similar for verification.

kloudkl on 22 Mar 2014

@longjon's 3 has my vote since it is reasonable to bundle a solverstate and its related params in the same directory and this option is simple to implement and document.

shelhamer on 22 Mar 2014

Whenever possible, we should fail early and fail often to avoid wasting the users precious time to run the incorrect networks. UUID signature helps avoid accidentally loading the wrong parameter files.

kloudkl on 22 Mar 2014

I vote for 3, there cannot be two parameter files with same name in the same directory.

Sent from my iPhone

On Mar 21, 2014, at 4:39 PM, longjon [email protected] wrote:

The .solverstate files saved by caffe use relative pathnames to find the relevant parameter file. This means that resuming from a solverstate in the wrong directory can cause a crash, or worse (if there are parameter files in the current directory), silently use unexpected parameters.

We could:

Ignore and document this issue
Use absolute pathnames in solverstate files (but then moving them around becomes difficult)
Look for parameters in the same directory as the solverstate file
Embed parameters in the solverstate files (or vice versa) (but this couples things that might not want to be coupled)
—
Reply to this email directly or view it on GitHub.

sguada on 22 Mar 2014

The parameter snapshot file name is not informative enough. The users may mistakenly copy and paste irrelevant files of different experiments and override the ones that took hours or even days to train.

  NetParameter net_param;
  // For intermediate results, we will also dump the gradient values.
  net_->ToProto(&net_param, param_.snapshot_diff());
  string filename(param_.snapshot_prefix());
  const int kBufferSize = 20;
  char iter_str_buffer[kBufferSize];
  snprintf(iter_str_buffer, kBufferSize, "_iter_%d", iter_);
  filename += iter_str_buffer;
  LOG(INFO) << "Snapshotting to " << filename;
  WriteProtoToBinaryFile(net_param, filename.c_str());

kloudkl on 23 Mar 2014

VasLem on 17 May 2017

👍2

I have the same problem as @VasLem, when I move both files to a different folder, and then restore the solverstate, caffe complains about not being able to find the caffemodel in its original location. Unless I'm missing something, this doesn't seem like a relative path. Or do most people not run into this issue?