another "transient" error with DataTransferStep. see (#1045, #1058)
RunId:9b0b0f9e-3972-47a1-a874-76f525b7a5ed
PipelineRunId: 5c6d9ccc-b2ff-48e9-9f4c-9da2fa976789
[2020-07-16 14:52:50Z] Parsing command text
[2020-07-16 14:52:50Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "aeee628e-b447-4f5f-84ce-ec185e9966ca",
"DestinationDataId": "90b0fa56-90d3-4180-941d-1f6b817576d2",
"OutputDataId": "8f250391-6994-4f36-a346-9f658cc32f65",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-07-16 14:52:51Z] Copy source: Blob storage account: avaprditsmlstorage, directory: dealpipeline/azureml/226d50ae-0ff0-4f9d-a181-ea8737e5a2f1/gold_data1, filename: , binary copy: True, using SAS: False
[2020-07-16 14:52:51Z] Copy sink: Blob storage account: avaprditsmlstorage, directory: dealoutput/deal-master/2020-06-30 07.25.31/stage1, filename: , binary copy: True, using SAS: False
[2020-07-16 14:52:54Z] RunId:[eb421ff2-2d53-4a0c-a579-e799806b8adb] ParentRunId:[5c6d9ccc-b2ff-48e9-9f4c-9da2fa976789] ComputeTarget:[ADF]
[2020-07-16 14:53:15Z] Parsing command text
[2020-07-16 14:53:15Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "aeee628e-b447-4f5f-84ce-ec185e9966ca",
"DestinationDataId": "90b0fa56-90d3-4180-941d-1f6b817576d2",
"OutputDataId": "8f250391-6994-4f36-a346-9f658cc32f65",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-07-16 14:53:23Z] Data transfer job failed with unexpected error:
CopyOperationEntity is not populated in DataCopyCommand and stack trace:
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.CancelJobAction.GetCopyOperationEntity() in d:\dbs\sh\Ae\0710_161847\cmd\i\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\Actions\CancelJobAction.cs:line 55
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.CancelJobAction.ExecuteAsync()
at Microsoft.Aether.DataTransferCloud.JobProcessing.DataCopyJobProcessor.ProcessJobAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0710_161847\cmd\i\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\DataCopyJobProcessor.cs:line 45
happened again today on our daily scheduled ML pipeline. We'd be 100% down if our team lead didn't babysit this run and re-submit until the step passed worked.
Pipeline: 2e86e9bc-1720-407f-a666-6ba8b766e1c6
StepRun: a56cb941-e200-48c0-81bf-32d8121af45d
executionlogs.txt[2020-07-17 14:24:01Z] Parsing command text
[2020-07-17 14:24:01Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "122d3465-4246-4cf5-ad05-78a01a849af6",
"DestinationDataId": "64500dfa-0f38-4cfe-81fb-d688e2be9925",
"OutputDataId": "ebf5a56d-0b26-46ae-aa4f-67b165c29f7c",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-07-17 14:24:13Z] Copy source: Blob storage account: avaprditsmlstorage, directory: dealpipeline/azureml/b7c1da4c-ad1b-4eaf-b616-fc5ddc3638d6/gold_data6, filename: , binary copy: True, using SAS: False
[2020-07-17 14:24:13Z] Copy sink: Blob storage account: avaprditsmlstorage, directory: dealoutput/deal-master/2020-06-30 07.25.31/stage6, filename: , binary copy: True, using SAS: False
[2020-07-17 14:25:53Z] Failed to start copy operation because of error:
Failure in StartCopyOperation while calling service DataFactory; HttpMethod: POST; Response StatusCode: ; Exception type:
System.Threading.Tasks.TaskCanceledException|-System.IO.IOException|-System.Net.Sockets.SocketException, stack trace:
at Microsoft.Aether.DataTransferCloud.DataFactory.DataFactoryClient.StartCopyOperationAsync(IPipelineResource source, IPipelineResource sink) in d:\dbs\sh\Ae\0710_161847\cmd\m\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.DataFactory\DataFactoryClient.cs:line 128
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyService.StartCopyOperationAsync(DataReference sourceDataReference, DataReference destinationDataReference, CopyOperationOptions copyOptions) in d:\dbs\sh\Ae\0710_161847\cmd\27\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyService.cs:line 67
[2020-07-17 14:25:53Z] Data transfer job failed with unexpected error:
Failure in StartCopyOperation while calling service DataFactory; HttpMethod: POST; Response StatusCode: ; Exception type:
System.Threading.Tasks.TaskCanceledException|-System.IO.IOException|-System.Net.Sockets.SocketException and stack trace:
at Microsoft.Aether.DataTransferCloud.DataFactory.DataFactoryClient.StartCopyOperationAsync(IPipelineResource source, IPipelineResource sink) in d:\dbs\sh\Ae\0710_161847\cmd\m\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.DataFactory\DataFactoryClient.cs:line 128
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyService.StartCopyOperationAsync(DataReference sourceDataReference, DataReference destinationDataReference, CopyOperationOptions copyOptions) in d:\dbs\sh\Ae\0710_161847\cmd\27\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyService.cs:line 67
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.SubmitJobAction.ExecuteAsync() in d:\dbs\sh\Ae\0710_161847\cmd\i\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\Actions\SubmitJobAction.cs:line 45
at Microsoft.Aether.DataTransferCloud.JobProcessing.DataCopyJobProcessor.ProcessJobAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0710_161847\cmd\i\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\DataCopyJobProcessor.cs:line 45
@swanderz i'm looking into these and other issues you reported recently
@akshay-0, unfortunately this bug is still rearing its ugly head despite the hotfix.
PipelineRun: dfa2b1e0-7e62-471e-889b-748b2dff7561
StepRunId : 410f3332-caa3-4979-95da-301ca13b89c1
[2020-07-22 14:45:24Z] Parsing command text
[2020-07-22 14:45:24Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "cfad3a15-bdac-45ad-bfd8-f66b4f57d315",
"DestinationDataId": "14aced40-2cb4-4f6e-8ee4-94d2371012d0",
"OutputDataId": "3f826907-8bca-4e30-9427-2a03c9c5d8dc",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-07-22 14:45:24Z] Copy source: Blob storage account: avaprditsmlstorage, directory: dealpipeline/azureml/6db30a2e-1507-4092-bda3-1cd4c2b42fba/best_run_data5, filename: , binary copy: True, using SAS: False
[2020-07-22 14:45:24Z] Copy sink: Blob storage account: avaprditsmlstorage, directory: dealoutput/deal-master/latest/stage5, filename: , binary copy: True, using SAS: False
[2020-07-22 14:45:28Z] RunId:[410f3332-caa3-4979-95da-301ca13b89c1] ParentRunId:[dfa2b1e0-7e62-471e-889b-748b2dff7561] ComputeTarget:[ADF]
[2020-07-22 14:47:24Z]
Data transfer job failed with unexpected error:
Failure in ValidateDataFactoryReady while calling service DataFactory; HttpMethod: GET; Response StatusCode: ;
Exception type: System.Threading.Tasks.TaskCanceledException|-System.IO.IOException|-
System.Net.Sockets.SocketException and stack trace:
at Microsoft.Aether.DataTransferCloud.DataFactory.DataFactoryClient.ValidateDataFactoryReadyAsync() in d:\dbs\sh\Ae\0720_121258_0\cmd\u\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.DataFactory\DataFactoryClient.cs:line 79
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.CreateCopyServiceAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0720_121258_0\cmd\2i\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 62
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.JobActionFactory.GetJobActionAsync(DataTransferJobMetadata job, StateMachine stateMachine) in d:\dbs\sh\Ae\0720_121258_0\cmd\1j\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\Actions\JobActionFactory.cs:line 41
at Microsoft.Aether.DataTransferCloud.JobProcessing.DataCopyJobProcessor.ProcessJobAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0720_121258_0\cmd\1j\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\DataCopyJobProcessor.cs:line 44
Looking..
@akshay-0 got another one last night
run id: 1b89acb4-1a40-4078-bc9e-c4de0810a22f
pipeline id: 5bd1f504-1901-4bd5-82d3-bb5c82228af1
[2020-07-27 14:36:51Z] Parsing command text
[2020-07-27 14:36:51Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "83d4e1c3-a65d-4266-9ee1-fedbbecf24c4",
"DestinationDataId": "bbefe549-f67c-421c-8933-bc1167a12ec2",
"OutputDataId": "c56eb8ea-3e12-4dd0-b1b1-dae03b8f52ae",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-07-27 14:36:51Z]
Data transfer job failed with unexpected error:
Failure in GetComputeResourceAndSecrets GetEntityWithAuthHeader while calling service WorkspaceResource; HttpMethod: GET; Response StatusCode: Unauthorized; Exception type:
Microsoft.RelInfra.Extensions.HttpRequestDetailException and stack trace:
at Microsoft.Aether.BlueBox.WorkspaceResourcesClient.WorkspaceResourcesClient.GetComputeResourceAndSecretsAsync(WorkspaceIdentity workspace, String computeName, CreatedBy createdBy, CancellationToken cancellationToken) in d:\dbs\sh\Ae\0720_121258_0\cmd\22\src\aether\platform\backendV2\BlueBox\WorkspaceResourcesClient\Microsoft.Aether.BlueBox.WorkspaceResourcesClient\WorkspaceResourcesClient.cs:line 97
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.GetDataFactoryConfigFromComputeAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0722_110658\cmd\a\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 88
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.GetDataFactoryConfigAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0722_110658\cmd\a\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 73
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.CreateCopyServiceAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0722_110658\cmd\a\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 48
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.JobActionFactory.GetJobActionAsync(DataTransferJobMetadata job, StateMachine stateMachine) in d:\dbs\sh\Ae\0722_110658\cmd\v\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\Actions\JobActionFactory.cs:line 41
at Microsoft.Aether.DataTransferCloud.JobProcessing.DataCopyJobProcessor.ProcessJobAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0722_110658\cmd\v\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\DataCopyJobProcessor.cs:line 47
This seems like a different issue than the ones i saw before. I'll follow up with the right team. Just to double check, were you able to run data transfer successfully later (using same data factory resource, same workspace)?
@akshay-0 yeah, if we run it again it works. My teammate is just sick of having to manually run the pipeline every time it fails. We're kicking off work on the OutputDataset Private Preview w/ @MayMSFT and team soon. If that's successful, then we don't need any DataTransferSteps
@akshay-0 another issue today with GetComputeResourceAndSecrets
run_id: a25af474-32d7-43c1-b606-cbe8360e1abd
[2020-08-06 13:47:34Z] Parsing command text
[2020-08-06 13:47:34Z] Parsed command line. Will be submitting job : {
"Command": 2,
"CopyCommand": null,
"WaitCommand": null,
"DataCopyCommand": {
"ComputeName": "adf",
"AzureDataFactoryConfig": null,
"SourceDataId": "daa455e3-3e55-4f8c-a39b-12d58fe11c4b",
"DestinationDataId": "4c17a197-e286-4311-a935-69aec2c808a8",
"OutputDataId": "6a889fff-6c32-4947-a088-b00db94eda82",
"CopyOperationEntity": null,
"CopyOptions": "{\"source_reference_type\": \"directory\", \"destination_reference_type\": \"directory\"}",
"PolicyValidationStatus": 0
},
"IsDataManagementEnabled": true,
"ComplianceCluster": null,
"EuclidWorkspaceId": null
}
[2020-08-06 13:47:34Z]
Data transfer job failed with unexpected error:
Failure in GetComputeResourceAndSecrets GetEntityWithAuthHeader while calling service WorkspaceResource; HttpMethod: GET; Response StatusCode: InternalServerError; Exception type:
Microsoft.RelInfra.Extensions.HttpRequestDetailException; Exception type: Microsoft.RelInfra.Common.Exceptions.ServiceInvocationException|-Microsoft.RelInfra.Extensions.HttpRequestDetailException; Stack trace:
at Microsoft.Aether.BlueBox.WorkspaceResourcesClient.WorkspaceResourcesClient.GetComputeResourceAndSecretsAsync(WorkspaceIdentity workspace, String computeName, CreatedBy createdBy, CancellationToken cancellationToken) in d:\dbs\sh\Ae\0728_124130\cmd\a\src\aether\platform\backendV2\BlueBox\WorkspaceResourcesClient\Microsoft.Aether.BlueBox.WorkspaceResourcesClient\WorkspaceResourcesClient.cs:line 98
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.GetDataFactoryConfigFromComputeAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0728_124130\cmd\1o\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 91
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.GetDataFactoryConfigAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0728_124130\cmd\1o\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 76
at Microsoft.Aether.DataTransferCloud.CopyService.DataFactoryCopyServiceFactory.CreateCopyServiceAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0728_124130\cmd\1o\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.CopyService\DataFactoryCopyServiceFactory.cs:line 46
at Microsoft.Aether.DataTransferCloud.JobProcessing.Actions.JobActionFactory.GetJobActionAsync(DataTransferJobMetadata job, StateMachine stateMachine) in d:\dbs\sh\Ae\0729_020020_0\cmd\4\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\Actions\JobActionFactory.cs:line 41
at Microsoft.Aether.DataTransferCloud.JobProcessing.DataCopyJobProcessor.ProcessJobAsync(DataTransferJobMetadata job) in d:\dbs\sh\Ae\0729_020020_0\cmd\4\src\aether\platform\backendV2\BlueBox\Clouds\DataTransferCloudK8s\DataTransferCloudK8s.JobProcessing\DataCopyJobProcessor.cs:line 47
@swanderz Thank you for your feedback. We have raised an incident for this and are investigating. Please let us know if the issue still persists.