Efcore: Connection between dockerized application and azure sql hangs after ef core upgrade to 3.1

Created on 17 Mar 2020  路  18Comments  路  Source: dotnet/efcore

Our solution consists of almost 40 projects, 6 of them are runnable applications. In the simplest scenario we only need two applications running - webapi and worker which communicate through rabbitmq in a command-commandhandler pattern. We use .netcoreapp3.1 (to which we recently migrated from 2.2) and ef core 3.1 (to which we migrated afterwards). After ef core migration we encountered a strange issue - our worker app just stops executing code at random place during our custom made seeding (and we can't really do anything without seeding so that's as far as we got as it comes to commands). There is no exception, no timeout, the last log we see from worker is always saying EFStatementsLogger.Log : EFInfo=[Executing DbCommand and the sql query showed in the log is never seen in the Azure Data Studio profiler so it never reaches the database. The query at which the application stops changes from seeding to seeding, there is no rule and we've been trying to find the source for 2 weeks now.

We narrowed down the issue to linux + efcore 3.1, let me quickly walk you through the investigation (tests were done repeatedly and the results were consistent):

  • running applications locally without docker and using sql express works - rules out many things but it's kinda 'works for me' so we can't rely on that
  • running applications locally without docker and using azure sql works - rules out azure sql
  • running dockerized applications on azure and using dockerized mssql express as a container instance - doesn't work
  • running applications on linux vm without docker and using azure sql doesn't work - rules out docker

We were able to narrow down the commits that could've caused the issue to six and two of them are pure migrations (we flattened a huge number of migrations into one - twice during that task). 90% changes in these four commits are about ef core's fluent api that wasn't working after ef core upgrade, indexes, includes, custom projections (nothing big - mostly changes like replacing string.Equals(a, b, StringComparison.CurrentCultureIgnoreCase); with a.ToLower() == b.ToLower();).
I could provide you with the diff from these four commits if you'd like but it's going to be a big one.

Our setup (that's been working for over a year now on ef core 2.*):

  • every application is dockerized separately
  • all applications run as dockerized web apps on azure

The issue is quite annoying and made us move back to windows apps on azure (where everything works just as it did before ef core upgrade). Since our solution is huge, there is no possibility to make a small POC project. We went through all breaking changes a few times but didn't notice anything wrong. Could you point us in any direction here? Any hint, any place worth checking? Anything you need will be supplied.

closed-external customer-reported

Most helpful comment

@ajcvickers We could provide a repro project that would be a stripped down version of our application solution. Still it would be quite big and we would have to put a significant effort into doing this so before we do so we would prefer to have confirmation from you that it actual would be useful/necessary for you to identify root cause.

All 18 comments

Team, could you please help on troubleshooting this issue.

@cheenamalhotra Could this be something with SqlClient? I think there have been some issues with Docker/Linux.

@Arcanst

May I know which linux docker image are you using in your application? I can try to test SqlClient connectivity in that image.

Sure!
Build: mcr.microsoft.com/dotnet/core/sdk:3.1-buster
Runtime - more complicated, I basically merged two netcore images into our base one, here is the dockerfile for our base image that's used by our applications:

FROM debian:buster

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        \
# .NET Core dependencies
        libc6 \
        libgcc1 \
        libgssapi-krb5-2 \
        libicu63 \
        libssl1.1 \
        libstdc++6 \
        zlib1g \
    && rm -rf /var/lib/apt/lists/*

# Configure web servers to bind to port 80 when present
ENV ASPNETCORE_URLS=http://+:80 \
    # Enable detection of running in a container
    DOTNET_RUNNING_IN_CONTAINER=true

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        curl \
    && rm -rf /var/lib/apt/lists/*

# Install .NET Core
RUN dotnet_version=3.1.1 \
    && curl -SL --output dotnet.tar.gz https://dotnetcli.azureedge.net/dotnet/Runtime/$dotnet_version/dotnet-runtime-$dotnet_version-linux-x64.tar.gz \
    && dotnet_sha512='991a89ac7b52d3bf6c00359ce94c5a3f7488cd3d9e4663ba0575e1a5d8214c5fcc459e2cb923c369c2cdb789a96f0b1dfb5c5aae1a04df6e7f1f365122072611' \
    && echo "$dotnet_sha512 dotnet.tar.gz" | sha512sum -c - \
    && mkdir -p /usr/share/dotnet \
    && tar -ozxf dotnet.tar.gz -C /usr/share/dotnet \
    && rm dotnet.tar.gz \
    && ln -s /usr/share/dotnet/dotnet /usr/bin/dotnet

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        curl \
    && rm -rf /var/lib/apt/lists/*

# Install ASP.NET Core
RUN aspnetcore_version=3.1.1 \
    && curl -SL --output aspnetcore.tar.gz https://dotnetcli.azureedge.net/dotnet/aspnetcore/Runtime/$aspnetcore_version/aspnetcore-runtime-$aspnetcore_version-linux-x64.tar.gz \
    && aspnetcore_sha512='cc27828cacbc783ef83cc1378078e14ac558aec30726b36c4f154fad0d08ff011e7e1dfc17bc851926ea3b0da9c7d71496af14ee13184bdf503856eca30a89ae' \
    && echo "$aspnetcore_sha512  aspnetcore.tar.gz" | sha512sum -c - \
    && tar -ozxf aspnetcore.tar.gz -C /usr/share/dotnet ./shared/Microsoft.AspNetCore.App \
    && rm aspnetcore.tar.gz

ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
    && apt-get install -y --no-install-recommends dialog \
    && apt-get update \
    && apt-get install -y --no-install-recommends openssh-server \
    && echo "$SSH_PASSWD" | chpasswd

COPY ["sshd_config", "/etc/ssh/"]
COPY ["init.sh", "/usr/local/bin"]
RUN chmod u+x /usr/local/bin/init.sh
EXPOSE 8000 2222

Untill installing ssh it's just a copy-paste from 3.1.2-buster-slim that's not available on dockerhub any longer, as far as i see.

Just want to add that we also re-produced the issue in a Azure virtual machine made from Ubuntu Server 18.04 LTS

where the following was done

wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo add-apt-repository universe
sudo apt-get update
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-3.1
sudo apt-get install aspnetcore-runtime-3.1
sudo apt-get install dotnet-runtime-3.1

Portable version of app deployed with ssh and the the issue was re-produce by running the application with the dotnet command

Can you also share the version of Azure DB/server you're connecting to
[Output of SELECT @@VERSION]

This doesn't seem right as we run CI tests in SqlClient from Ubuntu to Azure all the time!

Can you also verify if you're able to connect to Azure DB using just SqlClient driver from both docker/Ubuntu VM?

You can use this app for validation:
TestLinuxDocker.zip

SELECT @@VERSION gives Microsoft SQL Azure (RTM) - 12.0.2000.8 Feb 14 2020 18:30:14 Copyright (C) 2019 Microsoft Corporation

I can't check that app right now - will do it tomorrow and come back with results.

Let me jut specify one thing - it's not like we don't have the connection - we are able to truncate entire database from the code, then recreate it using migrations (we do it programatically). When seeding starts, worker always creates some objects in database before it hangs.

We even thought the issue could be caused by too large transaction because at first we had entire seeding (that lasted like 2 minutes locally) implemented as one huge db transaction; but after making the seeding create a separate transaction for each object inserted into db, the problem wasn't solved.

I'm thinking if it was possible to somehow use ef core's source for our application (it would be much easier if the issue was reproducable on developer's desk) to actually add more logs and see where exactly it hangs (apparently it's not a blocking call because our applications still send heartbeats to rabbitmq and so on).

Thanks @Arcanst I think this would then fall back to @ajcvickers for EF side of investigations first.

@ajcvickers

As @Arcanst mentioned, this doesn't look like connection problem but in a particular flow with EF Core APIs. I think you can take over from here to reproduce the problem and if it turns out to be with one of SqlClient API flows, please let us know with a repro. :)

Best Regards,
Cheena

@ajcvickers We could provide a repro project that would be a stripped down version of our application solution. Still it would be quite big and we would have to put a significant effort into doing this so before we do so we would prefer to have confirmation from you that it actual would be useful/necessary for you to identify root cause.

@Arcanst

I'm thinking if it was possible to somehow use ef core's source for our application

EF Core can be built from source easily--see https://github.com/dotnet/efcore/blob/master/docs/getting-and-building-the-code.md

You can build NuGet packages locally with build -pack. However, note that the NuGet package versions don't change between builds, which means you'll have to flush NuGet package caches (it's in the NuGet settings in VS) each time you rebuild the packages.

confirmation from you that it actual would be useful/necessary for you to identify root cause

I don't have any ideas as to what is going on here, so I can't be certain that we will be able identify the root cause even if we can reproduce the issue. That being said, it certainly seems unlikely that we will be able to root cause this _without_ being able to reproduce it.

If you don't want to post the code publicly, then feel free to send it to avickers at microsoft.com.

@AndriySvyryd @roji Any ideas here?

What version of Microsoft.Data.SqlClient are you using? If it's not 1.1.1 could you upgrade it and see whether that makes any difference?

Microsoft.Data.SqlClient, Version=1.0.19269.1 - we'll try upgrading it and let you know as soon as possible, thanks

Upgrading to to Microsoft.Data.SqlClient to 1.1.1 solved the issue which now can be closed - thanks

Please reconsider taking M.D.S. 3.1.1 in a 3.1 patch release.

@ErikEJ We discussed it before, but this is certainly another data point.

@ErikEJ We're going to do this and take it for approval.

@bricelam I'll probably be able to create a PR for this next week, but feel free to do it if you get time.

Filed #20378 to track updating the dependency.

Was this page helpful?
0 / 5 - 0 ratings