1.1.3
While running packer on CentOS 6 (maybe others) it seems I cannot ctrl-c SIGINT
the packer process and cancel when it is "waiting for Instance to become ready". I discovered while running a packer buildthat I didn't have the DescribeInstanceStatus
permissions, and wanted to cancel out. It just held in place no matter how many ctrl-c's i hit.
Once packer established that the instance was Ready, I was able to ctrl-c in later portions of the packer build without issue.
Some other things to add:
This may be something introduced in Packer 1.1.3 but I'm unsure. I don't remember seeing "waiting for instance to become ready" indefinite stalls like this happening with 1.1.2.
This made me wonder if someone was using sigprocmask(2)/pthread_sigmask(3) incorrectly, or is intentionally not handling said signals. Examining all the forked processes that packer generates (there are several) with FreeBSD's procstat -i does not show any processes with signals that have SIG_IGN set, so this seems to be more a signal handler decision than anything else.
Review of /tmp/packer-logXXXXXXXXX files shows that at least one of the packer processes does detect SIGINT being received, but chooses to do nothing about it:
2018/01/15 22:54:33 Closing stdin because interrupt received.
2018/01/15 22:54:33 Stopping build: amazon-ebs
2018/01/15 22:54:33 packer: 2018/01/15 22:54:33 Received interrupt signal (count: 1). Ignoring.
2018/01/15 22:54:33 packer: 2018/01/15 22:54:33 Received interrupt signal (count: 1). Ignoring.
2018/01/15 22:54:33 packer: 2018/01/15 22:54:33 Received interrupt signal (count: 1). Ignoring.
2018/01/15 22:54:33 packer: 2018/01/15 22:54:33 Received interrupt signal (count: 1). Ignoring.
2018/01/15 22:54:33 packer: 2018/01/15 22:54:33 Cancelling the step runner...
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 2). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 2). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 2). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 2). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 3). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 3). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 3). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 3). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 4). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 4). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 4). Ignoring.
2018/01/15 22:54:34 packer: 2018/01/15 22:54:34 Received interrupt signal (count: 4). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 5). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 5). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 5). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 5). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 6). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 6). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 6). Ignoring.
2018/01/15 22:55:26 packer: 2018/01/15 22:55:26 Received interrupt signal (count: 6). Ignoring.
...rinse lather repeat, up to a count of 12...
This may be something introduced in Packer 1.1.3 but I'm unsure. I don't remember seeing "waiting for instance to become ready" indefinite stalls like this happening with 1.1.2.
This specific problem (not the general signal handler issue) is specific to 1.1.3 with use of AWS. Rolling back to 1.1.2 rectifies this problem (you can download old versions by changing the version number in two places in the download URL). References for my statements:
The ignoring of SIGINT/SIGSTOP etc. still needs investigation.
I also noticed this running packer 1.1.3 on Windows last week. Appeared packer processed the interrupt after packer connected to the EC2 instance (the cleanup routine then began).
hmm, probably need to cancel the waiter context when we detect a signal, thanks for the report!
edit: I've confirmed that this is the problem. This solves it:
diff --git a/builder/amazon/common/step_run_source_instance.go b/builder/amazon/common/step_run_source_instance.go
index aba0e9ab5..cf1557c54 100644
--- a/builder/amazon/common/step_run_source_instance.go
+++ b/builder/amazon/common/step_run_source_instance.go
@@ -1,6 +1,7 @@
package common
import (
+ "context"
"encoding/base64"
"fmt"
"io/ioutil"
@@ -178,7 +179,17 @@ func (s *StepRunSourceInstance) Run(state multistep.StateBag) multistep.StepActi
describeInstance := &ec2.DescribeInstancesInput{
InstanceIds: []*string{aws.String(instanceId)},
}
- if err := ec2conn.WaitUntilInstanceRunning(describeInstance); err != nil {
+ ctx, cancel := context.WithCancel(context.Background())
+
+ go func() {
+ for {
+ if _, ok := state.GetOk(multistep.StateCancelled); ok {
+ cancel()
+ }
+ }
+ }()
+
+ if err := ec2conn.WaitUntilInstanceRunningWithContext(ctx, describeInstance); err != nil {
err := fmt.Errorf("Error waiting for instance (%s) to become ready: %s", instanceId, err)
state.Put("error", err)
ui.Error(err.Error())
will work on a more general solutin
this has been fixed
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
hmm, probably need to cancel the waiter context when we detect a signal, thanks for the report!
https://github.com/hashicorp/packer/blob/master/builder/amazon/common/step_run_source_instance.go#L181
edit: I've confirmed that this is the problem. This solves it:
will work on a more general solutin