Terraform: Deleting an environment fails on Windows

Created on 20 Jun 2017  ยท  11Comments  ยท  Source: hashicorp/terraform

Terraform Version

Terraform v0.9.8

Debug Output

Pasted here as it is very short:

2017/06/20 15:35:27 [WARN] Invalid log level: "1". Defaulting to level: TRACE. Valid levels are: [TRACE DEBUG INFO WARN ERROR]
2017/06/20 15:35:28 [INFO] Terraform version: 0.9.8  8d560482c34e865458fd884cb0790b4f73f09ad1
2017/06/20 15:35:28 [INFO] Go runtime version: go1.8
2017/06/20 15:35:28 [INFO] CLI args: []string{"C:\\Users\\vjimenez\\scoop\\apps\\terraform\\current\\terraform.exe", "env", "delete", "xxx"}
2017/06/20 15:35:28 [DEBUG] Attempting to open CLI config file: C:\Users\vjimenez\AppData\Roaming\terraform.rc
2017/06/20 15:35:28 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2017/06/20 15:35:28 [INFO] CLI command args: []string{"env", "delete", "xxx"}
2017/06/20 15:35:28 [DEBUG] command: loading backend config file: E:\test\terraform-env-test\test
2017/06/20 15:35:28 [INFO] command: backend config not found, returning nil: E:\test\terraform-env-test\test
2017/06/20 15:35:28 [DEBUG] command: no data state file found for backend config
2017/06/20 15:35:28 [DEBUG] New state was assigned lineage "95d13254-4b4b-4114-aec7-7ce92a0acb52"
2017/06/20 15:35:28 [INFO] command: backend initialized: <nil>
2017/06/20 15:35:28 [INFO] command: backend <nil> is not enhanced, wrapping in local
2017/06/20 15:35:28 [DEBUG] plugin: waiting for all plugin processes to complete...
remove terraform.tfstate.d\xxx\terraform.tfstate: The process cannot access the file because it is being used by another process.

Expected Behavior

The environment should be deleted.

Actual Behavior

The environment is not deleted and a wrong error is given. File terraform.tfstate.d\xxx\terraform.tfstate cannot be used by another process as it simply does not exist (terraform env new does not create any state file).

Steps to Reproduce

  1. terraform env new xxx
  2. terraform env select default
  3. terraform env delete xxx

Important Factoids

The problem only occurs when running on Windows (the same steps on Linux work as expected). I am using Windows 10 (version 1703).

bug cli windows

All 11 comments

I just manually created an empty terraform.tfstate.d\xxx\terraform.tfstate file, and then attempted to delete the environment. I still got exactly the same error.

The error occurs with the master branch (v0.10.0) as well.

After doing some investigation I have found the reason for the error.

The error happens when os.RemoveAll is called:

func (b *Local) DeleteState(name string) error {
        // ......
    return os.RemoveAll(filepath.Join(b.stateWorkspaceDir(), name))
}

(see https://github.com/hashicorp/terraform/blob/master/backend/local/backend.go#L168)

DeleteState is called from:

func (c *WorkspaceDeleteCommand) Run(args []string) int {
        // ......
    err = b.DeleteState(delEnv)
    if err != nil {
        c.Ui.Error(err.Error())
        return 1
    }
        // ......
}

(see https://github.com/hashicorp/terraform/blob/master/command/workspace_delete.go#L123)

Just before trying to delete the state file (which does not exist at that point) an attempt to lock the state file is performed:

    if c.stateLock {
        lockCtx, cancel := context.WithTimeout(context.Background(), c.stateLockTimeout)
        defer cancel()

        // Lock the state if we can
        lockInfo := state.NewLockInfo()
        lockInfo.Operation = "workspace delete"
        lockID, err := clistate.Lock(lockCtx, sMgr, lockInfo, c.Ui, c.Colorize())
        if err != nil {
            c.Ui.Error(fmt.Sprintf("Error locking state: %s", err))
            return 1
        }
        defer clistate.Unlock(sMgr, lockID, c.Ui, c.Colorize())
    }

(see https://github.com/hashicorp/terraform/blob/master/command/workspace_delete.go#L115)

This ends up calling Lock in state/local.go where a call to createStateFiles is performed. This call creates the state file and leave it open (until Unlock is called; at that point the file gets closed). The problem is that Unlock is called after attempting to delete the state file.

func (s *LocalState) Lock(info *LockInfo) (string, error) {
        // ......
    if s.stateFileOut == nil {
        if err := s.createStateFiles(); err != nil {
            return "", err
        }
    }
        // ......
}

(see https://github.com/hashicorp/terraform/blob/master/state/local.go#L155)

I suppose a potential solution would be to unlock the state file before actually deleting the environment. This should work fine for local state files. But, I am not sure whether this is the right approach.

After more investigation, it seems the problem is related to a particular behavior in Windows.

While the following code (creating, deleting and closing a file) works in Linux, in Windows I get the same error that I get when trying to delete a Terraform workspace.

package main

import (
  "fmt"
  "os"
)

func main() {
  file, err := os.Create("test.txt")
  if err != nil {
    panic(fmt.Sprintf("Error opening file: %s", err))
  }

  err = os.Remove("test.txt")
  if err != nil {
    panic(fmt.Sprintf("Error deleting file: %s", err))
  }

  err = file.Close()
  if err != nil {
    panic(fmt.Sprintf("Error closing file: %s", err))
  }
}

Error message:

panic: Error deleting file: remove test.txt: The process cannot access the file because it is being used by another process.

this is because a delete on windows cannot be processed until all handles for the file are closed.

Exactly. This is what the last code snippet that I posted shows. In any case, Terraform should find a way to circumvent the different behavior in Windows.

Thanks for digging in and figuring out the root cause here, @betabandido!

Indeed it seems like some different behavior is needed on Windows here. We should look into whether it's actually necessary to hold the file handle open in order to keep the lock held, or whether there's some other way we can represent the lock that would allow us to use a _different_ file handle as the lock holder.

For locking purposes, opening a file in a temporal directory might indeed work.

But, somehow I expect a lock to be mostly useful in a situation where the state file is shared among multiple users (i.e., when using a remote backend). Is there any real need for a lock when using the local backend?

Indeed the lock is not as useful locally, since it just protects against concurrent _local_ runs. However, I think we would not want to take that away at this point since we've established that it exists, and we would also not want it to work only on certain platforms.

We'll have a look into what other ways we can implement this lock that are compatible with Windows. If all else fails then we will consider removing the local locking, but retaining it and making it work would be preferable.

Yes, local locking was specifically introduced to avoid the possibility of multiple processes executing in parallel, due either to operator error or errors within Terraform (terraform runs multiple processes as it is). It also takes into account certain supported configurations of shared filesystems, where multiple hosts may have "local" access to shared states.

We should be able to work around this. Hopefully we can eventually find a way to run the full test suite on windows, and prevent these sorts of regressions.

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings