Packer: Publishing to Azure Shared Image Gallery Not Reporting Success

Created on 11 Jul 2019  ·  28Comments  ·  Source: hashicorp/packer

Packer Version - 1.4.2
Platform - CircleCI/Docker

All, when building an image and publishing it to a shared image gallery, I'm getting a context timeout error as show in the gist. Given that the operation succeeds, (The image is published and I can spin up a VM from the Azure portal) it would seem there's some kind of problem with Packer detecting the copy was successful.

Gist - https://gist.github.com/timbrammer/4d0ad0fa96d77441a5317493162a2144

bug buildeazure regression

Most helpful comment

@jzuchora87 if you've got a chance, can you test this new build? https://circleci.com/gh/hashicorp/packer/8267#artifacts/containers/0

It sets the default to 60 minutes and then adds a configuration option -- you can set "shared_image_gallery_timeout": "120m", and I think that should do the trick.

Testing now -thanks!

image

All 28 comments

@timbrammer getting the same error. Has the version fully replicated to all the regions you've specified. For me that are taking a stupidly long time to replicated and I think packer it timing that operation out. As I don't think the command will return until the replication has completed

I'm not completely sure about the timing, but my runs show as "Completed" in the console before Packer times out. That's what led me to question either the status output from the API, or the way Packer is reading the status.

@timbrammer so the replication status for the image version you created is showing completed for you gallery before packer errors out?

I'm going to do another run now to verify this, but that has been my experience thus far, yes.

EDIT: I was incorrect. The console is still showing in progress, and the Packer build has failed.

I have had the same problem. In my case though, i verified that when the replication succeeds within 15 mins, packer works fine (it detects the completion correctly). So for me, the timeout of 15 mins for the shared image gallery replication seem to be the problem. I have asked for more info in the mailing list about this, but no response yet. I couldn't figure out a way to change the timeout through the cli/config file, for the builder.

I can confirm I'm having the same issue where the process completes successfully including the replication but reports a failed status. I was hoping that replicating to only one region may help and then have another task outside of the build image task update the replication to other regions but still no luck.

Error Message:
==> azure-arm: Publishing to Shared Image Gallery ...
==> azure-arm: -> MDI ID used for SIG publish : '/subscriptions/SUB-ID/resourceGroups/rgOSSecurityCenter/providers/Microsoft.Compute/images/IMG-VER'
==> azure-arm: -> SIG publish resource group : 'RGNAME'
==> azure-arm: -> SIG gallery name : 'SIGNAME'
==> azure-arm: -> SIG image name : 'IMGNAME'
==> azure-arm: -> SIG image version : 'IMGVER'
==> azure-arm: -> SIG replication regions : '[eastus]'
==> azure-arm:
==> azure-arm: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> azure-arm:
==> azure-arm: Cleanup requested, deleting resource group ...
==> azure-arm: Resource group has been deleted.
Build 'azure-arm' errored: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> Some builds didn't complete successfully and had errors:
--> azure-arm: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> Builds finished but no artifacts were created.

same issue here with both 1.4.2 and 1.4.3-dev versions.. I see the Image successfully published into the Image Gallery and I can spawn VMs off of it; however packer times out and reports bad job completion. we're using it as part of our DevOps pipelines so it has to report success.
my logs (quite similar to others) below:
==> azure-arm: Capturing image ...
==> azure-arm: -> Compute ResourceGroupName : 'sassTest'
==> azure-arm: -> Compute Name : 'pkrvm5lqnao9z1a'
==> azure-arm: -> Compute Location : 'westus2'
==> azure-arm: -> Image ResourceGroupName : 'sassTest'
==> azure-arm: -> Image Name : 'sassImage-201907250540'
==> azure-arm: -> Image Location : 'westus2'
==> azure-arm: Publishing to Shared Image Gallery ...
==> azure-arm: -> MDI ID used for SIG publish : '/subscriptions/3f9d29b8--692912ea6f8f/resourceGroups/sassTest/providers/Microsoft.Compute/images/sassImage-201907250540'
==> azure-arm: -> SIG publish resource group : 'sig-dev'
==> azure-arm: -> SIG gallery name : 'sigDev'
==> azure-arm: -> SIG image name : 'someName'
==> azure-arm: -> SIG image version : '9.0.1564076430'
==> azure-arm: -> SIG replication regions : '[westus2]'
==> azure-arm:
==> azure-arm: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> azure-arm:

I ran into same issue when I had multiple regions for replication. It does take a lot of time.

Did some quick investigation -- looks like we can fix this internally by setting the PollingDuration on the GalleryImageVersionsClient.Client. Pull down one of the patched builds here and let me know if that resolves the issue for y'all.

(to be clear -- in this test build, you don't need to do anything; I've hardcoded the polling duration to be infinite, to see if that resolves the issue. Just run your build with whichever binary matches your OS architecture, and let me know if the issue persists)

(to be clear -- in this test build, you don't need to do anything; I've hardcoded the polling duration to be infinite, to see if that resolves the issue. Just run your build with whichever binary matches your OS architecture, and let me know if the issue persists)

Testing now will update shortly. Thank you!

(to be clear -- in this test build, you don't need to do anything; I've hardcoded the polling duration to be infinite, to see if that resolves the issue. Just run your build with whichever binary matches your OS architecture, and let me know if the issue persists)

I was able to replicate to 14 regions (eastus, westus, northcentralus, eastus2,southcentralus,westus2,westeurope,northeurope,canadacentral,centralus,uksouth,westcentralus,ukwest,canadaeast) and it reported back success. Thanks @SwampDragons!

that's awesome news @jzuchora87! How long did it take you for the replication to complete? I think I want to increase the default pollingDuration in addition to making it configurable.

It worked for me too. I didn’t measure the replication time; but now the deployments are successful.

Just over an hour as reported from the Azure activity logs. Thanks again!


From: Megan Marsh notifications@github.com
Sent: Monday, July 29, 2019 4:36:28 PM
To: hashicorp/packer packer@noreply.github.com
Cc: Joe Zuchora jzuchora@onestreamsoftware.com; Mention mention@noreply.github.com
Subject: Re: [hashicorp/packer] Publishing to Azure Shared Image Gallery Not Reporting Success (#7865)

that's awesome news @jzuchora87https://github.com/jzuchora87! How long did it take you for the replication to complete? I think I want to increase the default pollingDuration in addition to making it configurable.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/hashicorp/packer/issues/7865?email_source=notifications&email_token=AIGDXDJIJ5BCEP6PHALYONLQB5IEZA5CNFSM4IBLSR62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3B5WDA#issuecomment-516152076, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIGDXDIP5VNTIIGNHJCOOO3QB5IEZANCNFSM4IBLSR6Q.

@jzuchora87 if you've got a chance, can you test this new build? https://circleci.com/gh/hashicorp/packer/8267#artifacts/containers/0

It sets the default to 60 minutes and then adds a configuration option -- you can set "shared_image_gallery_timeout": "120m", and I think that should do the trick.

@jzuchora87 if you've got a chance, can you test this new build? https://circleci.com/gh/hashicorp/packer/8267#artifacts/containers/0

It sets the default to 60 minutes and then adds a configuration option -- you can set "shared_image_gallery_timeout": "120m", and I think that should do the trick.

Testing now -thanks!

image

@swampdragons that worked just fine with the 120 minute time out.

Thanks for letting me know! I'll merge and close, and this option will be available in 1.4.3, which should come out in ~2 weeks

This setting is not working for me. I am receiving this error after about 30 minutes.

Builders:

"builders": [
    {
      "azure_tags": {
        "billingCode": "{{user `billing_code`}}",
        "buildNumber": "{{user `build_number`}}",
        "osVersion": "{{user `os_version`}}"
      },
      "client_id": "{{user `client_id`}}",
      "client_secret": "{{user `client_secret`}}",
      "communicator": "ssh",
      "image_offer": "CentOS",
      "image_publisher": "OpenLogic",
      "image_sku": "7.6",
      "location": "{{user `location`}}",
      "managed_image_name": "{{user `managed_image_name`}}",
      "managed_image_resource_group_name": "{{user `managed_image_resource_group_name`}}",
      "name": "nginx-prod-{{user `location` | lower}}-{{timestamp}}-{{user `os_version`}}",
      "os_disk_size_gb": "{{user `os_disk_size_gb`}}",
      "os_type": "Linux",
      "shared_image_gallery_destination": {
        "gallery_name": "{{user `shared_image_gallery_name`}}",
        "image_name": "{{user `shared_image_gallery_image_name`}}",
        "image_version": "{{user `shared_image_gallery_image_version`}}",
        "replication_regions": [
          "East US",
          "West Europe",
          "Australia East"
        ],
        "resource_group": "{{user `shared_image_gallery_resource_group`}}"
      },
      "shared_image_gallery_timeout": "1h",
      "subscription_id": "{{user `subscription_id`}}",
      "temp_resource_group_name": "{{user `temp_resource_group_name`}}",
      "tenant_id": "{{user `tenant_id`}}",
      "type": "azure-arm",
      "vm_size": "{{user `vm_size`}}"
    }
  ]

Packer version 1.4.5:

C:\hostedtoolcache\windows\packer\1.4.5\x64\packer.exe --version
1.4.5
Current installed packer version is 1.4.5.

Error logs:

==> nginx-prod-eastus-1574193366-7.6.20190708: Publishing to Shared Image Gallery ...
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> MDI ID used for SIG publish     : '/subscriptions/***/resourceGroups/prod-09-nginx-image-rg/providers/Microsoft.Compute/images/prod-09-us1-278352-nginx-image'
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> SIG publish resource group     : 'prod-09-nginx-image-rg'
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> SIG gallery name     : 'prod09nginxsig'
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> SIG image name     : 'nginx'
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> SIG image version     : '1.0.278352'
==> nginx-prod-eastus-1574193366-7.6.20190708:  -> SIG replication regions    : '[East US West Europe Australia East]'
==> nginx-prod-eastus-1574193366-7.6.20190708:
==> nginx-prod-eastus-1574193366-7.6.20190708: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> nginx-prod-eastus-1574193366-7.6.20190708: Provisioning step had errors: Running the cleanup provisioner, if present...
==> nginx-prod-eastus-1574193366-7.6.20190708: 
==> nginx-prod-eastus-1574193366-7.6.20190708: Cleanup requested, deleting resource group ...
==> nginx-prod-eastus-1574193366-7.6.20190708: Resource group has been deleted.

Build 'nginx-prod-eastus-1574193366-7.6.20190708' errored: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded

Is there something different I need to do?

This setting is not working for me. I am receiving this error after about 30 minutes.

I'm also having the same issue. Having a timeout of "120m" has no effect.

same issue here. packer 1.4.5 and the 120m setting is taking no effect. packer stops itself after about 20m of waiting - can you please reopen @SwampDragons

I am having the same issue on packer 1.4.5

Builder:

 "builders": [{
    "type": "azure-arm",
    "client_id": "x",
    "client_secret": "x",
    "tenant_id": "x",
    "subscription_id": "99fa3e6c-b736-413e-9d10-5c73505f385d",
    "managed_image_resource_group_name": "sqlstream",
    "managed_image_name": "sqlstream-base",
    "os_type": "Linux",
    "image_publisher": "Canonical",
    "image_offer": "UbuntuServer",
    "image_sku": "18.04-LTS",
    "azure_tags": {
        "dept": "Engineering",
        "task": "Image deployment"
    },
    "location": "East US",
    "vm_size": "Standard_DS2_v2"
  }],

Error logs:

```==> azure-arm: Querying the machine's properties ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-o6ydtc9ng7'
==> azure-arm: -> ComputeName : 'pkrvmo6ydtc9ng7'
==> azure-arm: -> Managed OS Disk : '/subscriptions/99fa3e6c-b736-413e-9d10-5c73505f385d/resourceGroups/packer-Resource-Group-o6ydtc9ng7/providers/Microsoft.Compute/disks/pkroso6ydtc9ng7'
==> azure-arm: Querying the machine's additional disks properties ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-o6ydtc9ng7'
==> azure-arm: -> ComputeName : 'pkrvmo6ydtc9ng7'
==> azure-arm: Powering off machine ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-o6ydtc9ng7'
==> azure-arm: -> ComputeName : 'pkrvmo6ydtc9ng7'

==> azure-arm:
==> azure-arm: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
==> azure-arm:
==> azure-arm: Cleanup requested, deleting resource group ...
==> azure-arm: Resource group has been deleted.
Build 'azure-arm' errored: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded

==> Some builds didn't complete successfully and had errors:
--> azure-arm: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
```

I am having the same issue on packer 1.4.5

hi @johnsout is your build publishing to a Shared image gallery? I don't see it as part of your configuration so I want to confirm your issue is the same as the original reported issue. If you are not publishing to a Shared Image Gallery have you tried setting the polling_duration_timeout to give Packer more time to complete?

Folks, a pull-request to address this issue has been submitted for approval. At your convenience can you please take a minute to download the patched binary available here and confirm that the fix is working as expected. Cheers!

The fix to the PollDuration regression has been merged. The changes are set to be in the next release of Packer.

The latest Packer release (1.5.0) has fixed this issue for me.

Thank you!

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings