Home: Workaround restore timeouts on systems with 1 cpu

Created on 26 Mar 2018  路  8Comments  路  Source: NuGet/Home

I have seen dotnet restore fail in two different ways on slow machines. On a Raspberry Pi it fails with "A task was canceled". In a docker container on an Azure A1 VM, it never completes and the dotnet process is stuck using about 4% CPU forever.

Azure A1 VM

Setup

  1. Create an Azure A1 VM
    A. OS: Ubuntu 16.04 LTS
    B. Disk Type: HDD
    C. Size: A1 (not A1_v2)
  2. Install docker
    A. https://get.docker.com/
  3. git clone https://github.com/mikeharder/MinPlaintext
  4. cd MinPlaintext
  5. git checkout bf88a6651ebb5e767da986309b6ddc454e538ef4
  6. cd docker/2.0
  7. ./build.sh

Error

dotnet restore never completes. The dotnet process is using 4% CPU forever.

Step 5/12 : COPY MinPlaintext/*.csproj ./MinPlaintext/
 ---> Using cache
 ---> a4c9bde04097
Step 6/12 : RUN dotnet restore
 ---> Running in 9349291ccafa
  Restoring packages for /app/MinPlaintext/MinPlaintext.csproj...
   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  5424 root      20   0 2927216  96784  55592 S  4.3  5.7   0:08.64 dotnet

Workaround

Edit Dockerfile and change dotnet restore to dotnet restore --disable-parallel.

Raspberry Pi

dotnet restore fails with "A task was canceled" on Raspberry Pi. I suspect the root cause is the machine is too slow and HTTP requests are timing out. The workaround is to use dotnet restore --disable-parallel, but I think dotnet restore needs to work by default.

Setup

  1. SSH into Raspberry Pi
  2. Build 2.1-sdk image locally (unless it's been published to https://hub.docker.com/r/microsoft/dotnet-nightly/)
    A. wget https://raw.githubusercontent.com/MichaelSimons/dotnet-docker/arm-sdk/2.1/sdk/stretch/arm32v7/Dockerfile
    B. docker build -t microsoft/dotnet-nightly:2.1-sdk .
  3. docker run -it --rm microsoft/dotnet-nightly:2.1-sdk
  4. mkdir mvc
  5. cd mvc
  6. dotnet new mvc --no-restore
  7. echo '<?xml version="1.0" encoding="utf-8"?><configuration><packageSources><add key="DotnetCore" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" /><add key="AspNetCore" value="https://dotnet.myget.org/F/aspnetcore-dev/api/v3/index.json" /></packageSources></configuration>' > NuGet.config

Error

root@2896796e0e87:/mvc# dotnet restore
  Restoring packages for /mvc/mvc.csproj...
  Restoring packages for /mvc/mvc.csproj...
  Retrying 'FindPackagesByIdAsync' for source 'https://api.nuget.org/v3-flatcontainer/microsoft.netcore.targets/index.json'.
  A task was canceled.
  Retrying 'FindPackagesByIdAsync' for source 'https://dotnetmyget.blob.core.windows.net/artifacts/aspnetcore-dev/nuget/v3/flatcontainer/microsoft.aspnetcore.antiforgery/index.json'.
  A task was canceled.
  ...

Workaround

root@2896796e0e87:/mvc# dotnet restore --disable-parallel
  Restoring packages for /mvc/mvc.csproj...
  Installing Microsoft.NETCore.App 2.1.0-preview2-26313-01.
  Installing Microsoft.AspNetCore.App 2.1.0-preview2-30338.
  ...
Restore Performance DCR

Most helpful comment

We should definitely look into getting the right number for parallel operation, besides we can also look into configuring http timeout to better suit our requirement.

All 8 comments

We've faced similar issues before.
Duplicate of this one

The issue is as you articulated, more connections than the device can handle.

Not sure we can do much because we need parallelism in the average case as it significantly improves the performance.

We'd need to reevaluate if 16 is really a good number and whether we do get benefits with it compared to something like 8.

We should definitely look into getting the right number for parallel operation, besides we can also look into configuring http timeout to better suit our requirement.

This can be duped with https://github.com/NuGet/Home/issues/4538, right?

Proposed improvement of the s2i assemble script: https://github.com/redhat-developer/s2i-dotnetcore/pull/228

@dtivel @rrelyea the PR used to close this issue is performing a workaround for single CPU systems. There is room for improvement.

Agree, we should still run with this issue and do a better job for slow machines. With that in mind, this issue could also be used to combine all the existing parallel switches or configurations.

@dtivel can you re-open the issue?

If needed, let's open a new issue, and link to this one.
Our release note process is based on closed issues, and i'd like to document this in our 5.0-p2 release notes. Changed the title to reflect what we did.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vsfeedback picture vsfeedback  路  3Comments

livarcocc picture livarcocc  路  3Comments

rrelyea picture rrelyea  路  3Comments

sylvainlavoie picture sylvainlavoie  路  3Comments

philippe-lavoie picture philippe-lavoie  路  3Comments