dep init freeze for hours

Created on 4 Aug 2017  ·  50Comments  ·  Source: golang/dep

What version of Go (go version) and dep (git describe --tags) are you using?

go version go1.8 linux/amd64
dep latest version

What dep command did you run?

$ dep init -v
No Output

What did you expect to see?

The new files
*Gopkg.toml
*Gopkg.lock
*vendor(folder)

What did you see instead?

Nothing the command stuck there for hours and I have to press ctr+c to exit and have back the terminal.

init

Most helpful comment

if you're running into this problem, a workaround should be to use the latest tip and set DEPNOLOCK=1 in the environment.

just, make very sure that you don't run two dep processes at once. it's reasonably likely to corrupt your GOPATH/pkg/dep/sources cache 💥 🤷‍♂️

All 50 comments

hi! thanks for the issue. this is quite odd.

did you run this on an existing project, or something new? if the former, what if any tool were you using previously? is this reproducible?

no places in dep jump to mind where it could hang prior to generating any output. unless...@matjam, maybe a lockfile thing?

strace output would also probably help enormously to clear this up.

might be.

maybe I should add some stderr logging if its waiting for >60 seconds .. at least until you figure out the notification framework thing.

I have the same problem! I use go in docker container and it is a very simple program.

@matjam yeah, i'll relent and break my rule a bit more - we can add some print statements around the lock. probably fire one right away if it decides we have to wait, then...maybe one every 10s or so thereafter?

@sdboyer okay, I'll give you a PR with something that does that. I'll use 15s if only to be contrary.

We are just assuming that it's the locking thing, but, it's the most likely culprit.

I did do some research; the locking library is mostly correct in it's behaviour; there is a potential race there, but I've not been able to quantify what the real risk is of hitting it.

What is the addage? Race conditions are so rare that they always happen? Something like that.

ok, we have the warning message printing now when the lockfile is busy. @jonahfang, @cpapidas - could you update and see if you're still experiencing the indefinite hang?

@sdboyer It keep saying and still hang:

waiting for lockfile /go/pkg/dep/sm.lock: Lockfile created, but doesn't exist

@jonahfang OK, can we get some more details about the system?

  • What kernel/distribution?
  • Can you show us the mount options (output of mount)?
  • Is this being run inside a container?
  • is there an existing sm.lock file?

@sdboyer Its this code in the lockfile package:

    // return value intentionally ignored, as ignoring it is part of the algorithm
    _ = os.Link(tmplock.Name(), name)

    fiTmp, err := os.Lstat(tmplock.Name())
    if err != nil {
        return err
    }
    fiLock, err := os.Lstat(name)
    if err != nil {
        // tell user that a retry would be a good idea
        if os.IsNotExist(err) {
            return ErrNotExist
        }
        return err
    }

So it's failing to link.

@nightlyone any ideas?

@jonahfang also, can you try the following

$ go get github.com/nightlyone/lockfile
$ cd $GOPATH/src/github.com/nightlyone/lockfile
$ go test

you should see output like this:

$ go test
PASS
ok      github.com/nightlyone/lockfile  0.011s

@matjam , the test result of nightlyone/lockfile as follows:

/go/src/github.com/nightlyone/lockfile $ go test
Error locking lockfile:  link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
        lockfile_test.go:129: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.695546947 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
        lockfile_test.go:158: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451082950 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.01s)
        lockfile_test.go:202: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.451935720 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.01s)
        lockfile_test.go:231: unexpected error: link /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.088922970 /go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL    github.com/nightlyone/lockfile  0.043s

@jonahfang yeah something is wrong with that filesystem.

still need to see output of "mount"

I am being run inside a container, there is not an existing sm.lock file. I use image registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0.

I run docker image in MacBookPro using Docker Toolbox:

docker run \
  --rm \
  -it \
  --name godemo \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v $PWD:/go \
  -v $PWD/.vim:/home/dev \
  registry.cn-hangzhou.aliyuncs.com/fangzx/govim_1.0

My guess is your container doesn't allow writes to the filesystem for whatever reason?

But I use glide and godep well.

But when it's not in a container, it works fine. So...

I use glide and godep in the same docker container well.

That really means nothing, just because other things work, doesn't mean the container isn't broken, or the way you're using it isn't broken, or that there isn't something wrong with your kernel, or ...

That "operation not permitted" message is coming from the kernel. It's a response to a system call for doing file operations on /go/src/github.com/nightlyone/lockfile/test_lockfile.pid.397504553

if the kernel is saying "no" that's a problem with either the container you're using, or docker, or something else on your system.

My docker container use ubuntu:14.04,and filesystem mounted througth VirtualBox volume.

It's because it's trying to do a hard link inside the container. That container, or your system, doesn't allow that, so it's failing.

@sdboyer I think we need to change the way we do locking, this is another issue that's being caused by the hard link in the lockfile package; there was the ReFS issue #913 and now this.

@jonahfang are you running Windows 10 with ReFS for the underlying filesystem by any chance?

No, I use mac only.

Well, the container, or something, is not allow hard links in the filesystem, so that's why it's failing.

@jonahfang Can you please grab this library and test it in your container. You should get the following output:

$ go get gopkg.in/check.v1
$ go get github.com/theckman/go-flock
$ cd $GOPATH/src/github.com/theckman/go-flock
$ go test
OK: 7 passed
PASS
ok      github.com/theckman/go-flock    0.012s

@matjam go test passed!

@matjam Any progress of this issue?

I need someone with ReFS on Windows to test the same module before I can recommend to @sdboyer that we switch to go-flock.

You'll have to find a workaround in the meantime. Not running inside a the container that you're using, for example. Other containers seem to handle links just fine.

Was there a patch for this? I have this exact problem, running a docker container inside an ubuntu trusty vagrant box on a mac.
I can create files inside the container just fine, it just won't do it via go dep.
It does the same thing at the vagrant box level as well. Vagrant mounts the source directory so I don't think it can hard link.

Same issue on Termux (Android Linux distro). It's running busybox, and I can "solve" this using proot:

proot --link2symlink dep init

Any chance we can use symlinks instead to avoid this workaround?

Update :

$ proot --link2symlink dep ensure -add github.com/src-d/go-git
all dirs lacked any go code

And nothing in the vendor folder... So, better but not a workaround yet.

Well, It looks like the linkat syscall get the EPERM as the following:

linkat(AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock.735551337", AT_FDCWD, "/shared/gopath/pkg/dep/sm.lock", 0) = -1 EPERM (Operation not permitted)

/shared (vboxsf) is mounted by virtualbox and my host machine is the Mac osx.
It's recommended to set gopath local file system:)

@cpapidas Would appreciate it if you could test this PR and confirm that it works for you.

@matjam I had tried your PR and I always get the same error.

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock
waiting for lockfile /root/gopath/pkg/dep/sm.lock

also if I run the dep with "official" master version I get the following error

$ dep init
waiting for lockfile /root/gopath/pkg/dep/sm.lock: Lockfile created, but doesn't exist

@cpapidas can you give us information about what system you're running dep on?

Is this being run inside a container?

Yes, I run this inside a docker container. All the golang configuration (GOPATH, etc ...) exists only in the container.

docker version Docker version 17.06.2-ce, build cec0b72

Also, have in mind that I run the docker from mac OS 10.12 or Vagrant 1.9.3.

dockerfile

FROM ubuntu:16.04

WORKDIR /root

RUN apt-get update \
  && apt-get install curl git -y \
  && cd ~ \
  && curl -O https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz \
  && tar xvf go1.8.linux-amd64.tar.gz \
  && chown -R root:root ./go \
  && mv go /usr/local

ENV GOROOT /usr/local/go
ENV GOPATH /root/gopath
ENV GOBIN /root/gopath/bin
ENV PATH $PATH:$GOROOT/bin:$GOPATH/bin

COPY run /usr/local/bin/run
RUN chmod +x /usr/local/bin/run

CMD ["/usr/local/bin/run"]

The run file just builds and run the golang program.

Please if you want anything else, let me know.

@cpapidas Can you give me the output of "docker info"?

$ docker info

Containers: 7
 Running: 0
 Paused: 0
 Stopped: 7
Images: 51
Server Version: 17.06.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 67
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-77-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.797GiB
Name: ubuntu-xenial
ID: BAX7:PGP6:LPUZ:7QVL:VGGP:YVLI:34R5:Y3SE:SSSC:BCKK:2DWS:ZDYM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Okay. So, apparently aufs doesn't do either fnctl style flocking or hard linking in a standard way.

The notes in the "incompatible" section are relevant, if not particularly clear: http://aufs.sourceforge.net/aufs.html

@sdboyer we will need to do something else. We can't use a global lock file like this as a way to arbitrate who gets to write changes.

Please note that the bug seems to only reproduce on non-Linux docker hosts. So I believe it is specific to vboxfs and not to aufs.

@nightlyone actually I can reproduce in Termux, which is not running Docker.

I'm facing the same problem with Mac OS (host) -> Vagrant (virtualbox) -> Ubuntu 16 (guest). No docker. The $GOPATH folder inside ubuntu is a virtual synced folder from the host OS.

Running dep on the same folder directly on the host OS works.

if you're running into this problem, a workaround should be to use the latest tip and set DEPNOLOCK=1 in the environment.

just, make very sure that you don't run two dep processes at once. it's reasonably likely to corrupt your GOPATH/pkg/dep/sources cache 💥 🤷‍♂️

The DEPNOLOCK=1 workaround fixes the problem in virtualbox+docker.
Thanks @ayang64 and @sdboyer.

FWIW ran into the same issue reported here while building github.com/sachaos/todoist on FreeBSD with zfs filesystem which is extremely common on FreeBSD. Might this be a stdlib bug?

I added my notes in case this helps future people landing on this erro, work-around works.

> uname -a
FreeBSD wintermute.skunkwerks.at 12.0-CURRENT FreeBSD 12.0-CURRENT #17 r325010+e50db1d5c304(master): Thu Oct 26 13:59:17 UTC 2017     root@wintermute:/usr/obj/usr/src/sys/GENERIC  amd64

> go version
go version go1.9.1 freebsd/amd64

> dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.9.1
 go compiler : gc
 platform    : freebsd/amd64
> dep ensure -v
dep ensure
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist
waiting for lockfile /repos/go/pkg/dep/sm.lock: Lockfile created, but doesn't exist

requested test repos

> go get github.com/nightlyone/lockfile
>  cd $GOPATH/src/github.com/nightlyone/lockfile
> go test
Error locking lockfile:  link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.431102591 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestBasicLockUnlock (0.00s)
--- FAIL: TestRogueDeletion (0.00s)
    lockfile_test.go:129: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.949387529 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRogueDeletionDeadPid (0.00s)
    lockfile_test.go:158: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.448157908 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestRemovesStaleLockOnDeadOwner (0.00s)
    lockfile_test.go:202: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.959225638 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
--- FAIL: TestInvalidPidLeadToReplacedLockfileAndSuccess (0.00s)
    lockfile_test.go:231: unexpected error: link /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid.332173640 /repos/go/src/github.com/nightlyone/lockfile/test_lockfile.pid: operation not permitted
FAIL
exit status 1
FAIL    github.com/nightlyone/lockfile  0.004s

>  go get gopkg.in/check.v1
> go get github.com/theckman/go-flock
> cd $GOPATH/src/github.com/theckman/go-flock
> go test
OK: 7 passed
PASS
ok      github.com/theckman/go-flock    0.003s

results when using workaround:

> env DEPNOLOCK=1 dep ensure -v
Gopkg.lock was already in sync with imports and Gopkg.toml
(1/22) Wrote golang.org/x/sys@master
(2/22) Wrote golang.org/x/text@master
(3/22) Wrote github.com/magiconair/[email protected]
(4/22) Wrote github.com/spf13/afero@master
(5/22) Wrote gopkg.in/yaml.v2@v2
(6/22) Wrote github.com/spf13/[email protected]
(7/22) Wrote github.com/spf13/[email protected]
(8/22) Wrote github.com/stretchr/[email protected]
(9/22) Wrote github.com/hashicorp/hcl@master
(10/22) Wrote github.com/fatih/[email protected]
(11/22) Wrote github.com/davecgh/[email protected]
(12/22) Wrote github.com/satori/[email protected]
(13/22) Wrote github.com/mitchellh/mapstructure@master
(14/22) Wrote github.com/mattn/[email protected]
(15/22) Wrote github.com/pelletier/[email protected]
(16/22) Wrote github.com/pmezard/[email protected]
(17/22) Wrote github.com/mattn/[email protected]
(18/22) Wrote github.com/pkg/browser@master
(19/22) Wrote github.com/spf13/[email protected]
(20/22) Wrote github.com/fsnotify/[email protected]
(21/22) Wrote github.com/spf13/jwalterweatherman@master
(22/22) Wrote github.com/urfave/[email protected]

\o/ thanks!

oi. yeah, another reason to swap out the lib we use to do this 😢

This issue is still occurring, has there been any progress at to change locking mechanism?

This is happening on linux, no container

@RichyHBM Did you try export DEPNOLOCK=1 in the shell before running go dep as a workaround?
That is what we have done until the mechanism changes.

Yes, that works as expected but it shouldnt be the fix

Lock mechanism still problem with docker(golang:1.10-alpine) on Windows10Pro(hyper-v enable).
Btw workaround is ok.

I get this on windows on an exFAT filesystem as well

Was this page helpful?
0 / 5 - 0 ratings