Runtime: macOS: Cannot add to X509Store [regression from 3.1]

Created on 19 Jul 2020  路  45Comments  路  Source: dotnet/runtime

I cannot seem to write to an X509Store in macOS. Code to reproduce:

using System.Security.Cryptography.X509Certificates;

namespace scratch {
    class Program {
        static void Main(string[] args) {
            using X509Certificate2 cert = new X509Certificate2("cert.pfx", "test", X509KeyStorageFlags.Exportable);
            using X509Store store = new X509Store(StoreName.My, StoreLocation.CurrentUser);
            store.Open(OpenFlags.ReadWrite);
            store.Add(cert);
        }
    }
}

This seems to always fail on store.Add(cert). Exception:

Unhandled exception. Interop+AppleCrypto+AppleCommonCryptoCryptographicException: UNIX[Undefined error: 0]
   at Interop.AppleCrypto.X509StoreAddCertificate(SafeKeychainItemHandle certOrIdentity, SafeKeychainHandle keychain)
   at Internal.Cryptography.Pal.StorePal.AppleKeychainStore.Add(ICertificatePal cert)
   at System.Security.Cryptography.X509Certificates.X509Store.Add(X509Certificate2 certificate)
   at scratch.Program.Main(String[] args) in /Users/kjones/Projects/scratch/Program.cs:line 9

The unit tests for this appear to be in X509StoreMutableTests_OSX. When I attempt to run them, all of them are skipped because the "can I write to the store?" conditional fails:

Writing precondition failed with kPOSIXErrorBase, skipping tests.

This is not an SSH session, or a VM, or anything out of the ordinary, it's a login zsh shell.

area-System.Security os-mac-os-x regression-from-last-release

Most helpful comment

As a test, if that code is changed to:

```c#
SafeKeychainHandle keychain = persist
? Interop.AppleCrypto.SecKeychainCopyDefault()
: Interop.AppleCrypto.CreateTemporaryKeychain();

System.GC.SuppressFinalize(keychain);
return ImportPkcs12(rawData, password, exportable, keychain);
```

Then the problem goes away, too.

All 45 comments

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq
Notify danmosemsft if you want to be subscribed.

Some other background on this, removing from the keychain does work, so it seems not related to the keychain being locked or in an inconsistent state. I did note that Big Sur is getting changes to how importing certificates _as root_ is handled, but saw nothing describing this exact use case.

Other thing to note: I can't reproduce this in netcoreapp3.1. I wonder if this has anything to do with how the binaries are signed and/or if 5 previews are signed differently than 3.1 official builds. Or, something broke between 3.1 and 5.0.

Thank you, @vcsjones! Are you planning to investigate this further? If not, we'll get someone from the team assigned to it so we can get it into 5.0 (probably RC2).

@jeffhandley

Are you planning to investigate this further?

Yeah, but I haven't made a ton of progress. I think a good first step would be to see if someone other than me can repro it, and do the unit tests actually run (not just skipped). Maybe it just doesn't like me.

I can still repro it on Big Sur beta 4, I'm installing 5 right now.

@vcsjones I got some guidance on how we could determine if this is related to signing/notarization or if there's a potential functional regression here. I'm trying to get my machine set up to be able to check this, but it looks like it'll take me some time, so I wanted to share with you in case this is quick for you to verify.

We would need to compare when running with _a signed_ apphost to running with _an unsigned_ apphost.

Hm, I think I can pull down 3.1 and build an unsigned version and see if it starts failing.

Thanks, @vcsjones. Here's what I understand:

  • If it fails with an _unsigned_ apphost:

    • Entitlements are not enforced, so there's a product regression

  • If it fails _only_ using a signed apphost:

    • There might be a new entitlement that needs to be added

I set this repro up on my MacBook while still on Catalina, and I was able to repro to get the same error as reported.

  • OS: macOS Catalina
  • SDK: dotnet-sdk-5.0.100.rc.1.20422.1-osx-x64.pkg

Repro Steps:

  1. dotnet new console
  2. Update Program.cs to have the same code as is shown in the issue description
  3. openssl req -x509 -days 365 -newkey rsa:4096 -keyout cert.key -out cert.crt (using "test" as the PEM pass phrase)
  4. openssl pkcs12 -export -in cert.crt -inkey cert.key -out cert.pfx (using "test" as the export password)
  5. dotnet run

I'm going to try this against .NET Core 3.1 on this setup.

It runs fine against .NET Core 3.1 on my Catalina machine but fails with .NET 5.0 on Catalina.

Combinations / Results

  • <TargetFramework>netcoreapp3.1</TargetFramework>

    • Without <UseAppHost>true</UseAppHost>



      • bin/Debug/netcoreapp3.1/x509.dll: code object is not signed at all (codesign -d bin/Debug/netcoreapp3.1/x509.dll)


      • dotnet run succeeds


      • dotnet run bin/Debug/netcoreapp3.1/x509.dll succeeds



    • With <UseAppHost>true</UseAppHost>



      • bin/Debug/netcoreapp3.1/x509.dll: code object is not signed at all (codesign -d bin/Debug/netcoreapp3.1/x509.dll)


      • dotnet run succeeds


      • dotnet run bin/Debug/netcoreapp3.1/x509.dll succeeds



  • <TargetFramework>net5.0</TargetFramework>

    • Without <UseAppHost>true</UseAppHost>



      • bin/Debug/net5.0/x509.dll: code object is not signed at all (codesign -d bin/Debug/net5.0/x509.dll)


      • dotnet run fails


      • dotnet run bin/Debug/net5.0/x509.dll fails



    • With <UseAppHost>true</UseAppHost>



      • bin/Debug/net5.0/x509.dll: code object is not signed at all (codesign -d bin/Debug/net5.0/x509.dll)


      • dotnet run fails


      • dotnet run bin/Debug/net5.0/x509.dll fails



@jeffschwMSFT do you have suggestions on where to go next from here in the investigation?

If you think it is a functional change that causes this, I can run a git-bisect overnight to try and identify the commit that regressed the behavior.

https://github.com/dotnet/runtime/commits/master/src/libraries/Native/Unix/System.Security.Cryptography.Native.Apple currently goes back to 3.1 on page 1, nothing (by title) looks suspicious, except possibly renaming the shims (maybe there's a manifest somewhere that needs to be updated?).

@wfurt did a lot of deeper investigations into certs on OSX. I think our next step is to understand what OSX API is failing and why.

cc @javiercn

@jeffhandley Also, I think when you ran dotnet run bin/Debug/net5.0/x509.dll, it should have been dotnet bin/Debug/net5.0/x509.dll (no "run"). The former is still doing dotnet run and just passing the path as the first args to Main. I don't think it changes the results when I tried your steps.

what OSX API is failing

SecKeychainItemCreateCopy

https://github.com/dotnet/runtime/blob/ec59f65100d741539b00e308294cc53f04fc326b/src/libraries/Native/Unix/System.Security.Cryptography.Native.Apple/pal_keychain.c#L354

and why

The resulting OS Status is kPOSIXErrorBase. That isn't a very descriptive error code.

Native stack:

debug   15:13:29.064492-0400    0x00007fff3614fa2d Security::CommonError::LogBacktrace() + 181
debug   15:13:29.064522-0400    0x00007fff3614fd51 Security::UnixError::UnixError(int, bool) + 277
debug   15:13:29.064546-0400    0x00007fff3614fdae Security::UnixError::throwMe(int) + 36
debug   15:13:29.064568-0400    0x00007fff35eb7394 Security::AtomicBufferedFile::open() + 696
debug   15:13:29.064593-0400    0x00007fff35eb5e21 Security::DbModifier::getDbVersion(bool) + 331
debug   15:13:29.064619-0400    0x00007fff35eb5ca5 Security::AppleDatabase::dbOpen(Security::DbContext&) + 57
debug   15:13:29.064645-0400    0x00007fff35eb55bf Security::Database::_dbOpen(Security::DatabaseSession&, unsigned int, Security::AccessCredentials const*, void const*) + 113
debug   15:13:29.064670-0400    0x00007fff35eb4932 Security::DatabaseManager::dbOpen(Security::DatabaseSession&, Security::DbName const&, unsigned int, Security::AccessCredentials const*, void const*) + 64
debug   15:13:29.064693-0400    0x00007fff35eb445d Security::DatabaseSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 189
debug   15:13:29.064717-0400    0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118
debug   15:13:29.064764-0400    0x00007fff35ec5f7c CSSM_DL_DbOpen + 152
debug   15:13:29.064862-0400    0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248
debug   15:13:29.064965-0400    0x00007fff35ffb0a9 SSDatabaseImpl::load(Security::DLDbIdentifier const&) + 53
debug   15:13:29.065066-0400    0x00007fff35ec636d SSDLSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 273
debug   15:13:29.065133-0400    0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118
debug   15:13:29.065244-0400    0x00007fff35ec5f7c CSSM_DL_DbOpen + 152
debug   15:13:29.065339-0400    0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248
debug   15:13:29.065439-0400    0x00007fff35ec5162 Security::CssmClient::DbImpl::activate() + 80
debug   15:13:29.065521-0400    0x00007fff35ff3e5c Security::CssmClient::DbImpl::handle() + 56
debug   15:13:29.065588-0400    0x00007fff35ed2a17 Security::CssmClient::DbUniqueRecordImpl::get(Security::CssmClient::DbAttributes*, Security::CssmDataContainer*) + 155
debug   15:13:29.065705-0400    0x00007fff360b5263 Security::KeychainCore::KeyItem::copyTo(Security::KeychainCore::Keychain const&, Security::KeychainCore::Access*) + 2611
debug   15:13:29.065787-0400    0x00007fff360f7096 SecKeychainItemCreateCopy + 257

It _looks_ like it is having trouble opening the keychain in the temporary directory.

Perhaps related log output:

open /private/var/folders/1s/8zmwdp4555j61y_lwd9vvm2w0000gn/T/e792f9d4e6db4762987911c6f22970c8.keychain: Undefined error: 0

@jeffhandley Also, I think when you ran dotnet run bin/Debug/net5.0/x509.dll, it should have been dotnet bin/Debug/net5.0/x509.dll (no "run"). The former is still doing dotnet run and just passing the path as the first args to Main. I don't think it changes the results when I tried your steps.

Good catch. I reran with dotnet .../x509.dll in all 4 scenarios and it did not change the results.

I'm wondering if some of this may depend on system integrity setting. I have 11.0 system up and I can take a look.
There was at least one other case where 11.0 broke binary compatibility with 10.x. That _could_ explain kPOSIXErrorBase.

Running log stream --process <PID_OF_SECURITYD> may give some more insight @vcsjones.

Running log stream --process may give some more insight @vcsjones.

Yeah, that's where my stack in https://github.com/dotnet/runtime/issues/39603#issuecomment-678681034 came from.

I'm wondering if some of this may depend on system integrity setting

Maybe? I'm just thrown by the fact that 3.1 works.

I have repro using latest daily 5.0 build:

bash-3.2$ ~/dotnet/dotnet run
Unhandled exception. Interop+AppleCrypto+AppleCommonCryptoCryptographicException: UNIX[Undefined error: 0]
   at Interop.AppleCrypto.X509StoreAddCertificate(SafeKeychainItemHandle certOrIdentity, SafeKeychainHandle keychain)
   at Internal.Cryptography.Pal.StorePal.AppleKeychainStore.Add(ICertificatePal cert)
   at System.Security.Cryptography.X509Certificates.X509Store.Add(X509Certificate2 certificate)
   at scratch.Program.Main(String[] args) in /Users/furt/certs/Program.cs:line 9

I will take a look and share findings.

It seems like this is somehow related to deleting temporary keychain _before_ finishing the import @bartonjs. If I comment out that code, (turning AppleCryptoNative_SecKeychainDelete to NOP) import works as expected.
Does it ring any bell?

I also verified that 3.1.6 works. I don't have 3.1 build on my 11.0 machine so I did not check if the sequence is the same.

deleting temporary keychain before finishing the import

Interesting. Thanks for getting to the bottom of that.

Does it ring any bell?

Not particularly... but that must mean we changed the managed code for it... possibly as part of the PFX rewrite? But I'm surprised an order of operations would have changed.

I wonder if #39086 is the same root cause.

I tried to do binary search and even if I go back to December/19, adding to store is still failing. So perhaps this is something prior that or it is related to some other build changes.

The problem is basically here:

https://github.com/dotnet/runtime/blob/afa4b2f344192c3609dd3c1419f68d99d10d7d62/src/libraries/System.Security.Cryptography.X509Certificates/src/Internal/Cryptography/Pal.OSX/AppleCertificatePal.cs#L106-L113

We import the X509Certificate2 in to a temporary keychain (because PersistKeySet is not set). I think the temporary keychain that the key gets imported in to needs to live as long as the returned ICertificatePal.

Did that part changed? I thought I saw similar code in 3.1 but I may be mistaken.
Also, log seems to complain about old-style keychain outside of ~/Library/Keychains.
I was wondering if the location also may make difference of there is some other way how to avoid that warning.
(But I did not have time (yet?) for experiments)

As a test, if that code is changed to:

```c#
SafeKeychainHandle keychain = persist
? Interop.AppleCrypto.SecKeychainCopyDefault()
: Interop.AppleCrypto.CreateTemporaryKeychain();

System.GC.SuppressFinalize(keychain);
return ImportPkcs12(rawData, password, exportable, keychain);
```

Then the problem goes away, too.

It seems like holding ref to the keychain is no longer good enough.

if the location also may make difference

It doesn't. I tried. (I also tried changing the keychain temp file extension to keychain-db in case for some reason that mattered)

Did that part changed?

I don't think so. I still haven't been able to nail down why 3.1 is working, but I thought I might be better off looking in to 5.0. It's entirely possible 3.1 is working by luck.

Is there a possibility of a GC change during 5.0 that could have affected this?

I asked @JulieLeeMSFT and @mangod9 to take a look. Sometimes things that have have lifetime issues can 'work' for many years before a slight change in allocations that yield the failure - we often see this with Timers, etc.

Thanks, @jeffschwMSFT!

Even if it was introduced by outside factors, I think the fix would likely be along the lines of what @vcsjones illustrated. Are there any objections to moving forward with that as the fix?

Are there any objections to moving forward with that as the fix?

That wasn't a complete fix unfortunately, all it did was stop the temp .keychain file from being deleted. That change as-is will leave the disk cluttered with temporary keychains.

I think a correct fix would be to transfer ownership of the SafeTemporaryKeychainHandle to the AppleCertificatePal iif the PKCS12 file is loaded as non-persisted and the PAL is constructed with a SafeSecIdentityHandle. When the PAL is disposed, _then_ we can clean up the temp keychain.

I can give that a shot.

I think we need to take a closer look. Based on the code pattern (embedded within a using), I am surprised that this is failing out of scope.

@jkotas @davidwrighton

I don't think this is a GC issue - I threw in the SuppressFinalize so the GC wouldn't clean up the handle (which deletes the temp keychain) and force it to leak. That snippet was solely there to at least confirm that is the source of some object-lifetime issue.

Got it; thanks for the clarification, @vcsjones.

I'll take a look at the codegen side to see what we're doing for GC tracking.

, I am surprised that this is failing out of scope.

Note that this is all new code in 5.0. ImportPkcs12 call at https://github.com/dotnet/runtime/issues/39603#issuecomment-681047300 did not exist in 3.1. I think it is most likely a bug in the new code, it is unlikely to be a bug in the core runtime.

Since this fails with debug codegen it's also unlikely to be a jit GC tracking issue.

GC info leading up to the call to ImportPkcs12 looks reasonable:

000122 E829FBFFFF           call     AppleCrypto:CreateTemporaryKeychain():System.Security.Cryptography.Apple.SafeTemporaryKeychainHandle
000127 48894590             mov      gword ptr [rbp-70H], rax
00012B EB09                 jmp      SHORT G_M61524_IG08
                        ;; bbWeight=1    PerfScore 13.50
G_M61524_IG07:
00012D E8CEFAFFFF           call     AppleCrypto:SecKeychainCopyDefault():System.Security.Cryptography.Apple.SafeKeychainHandle
000132 48894590             mov      gword ptr [rbp-70H], rax
                        ;; bbWeight=1    PerfScore 2.00
G_M61524_IG08:
000136 488B7D90             mov      rdi, gword ptr [rbp-70H]
00013A 48897DC0             mov      gword ptr [rbp-40H], rdi
00013E 488B7DC0             mov      rdi, gword ptr [rbp-40H]
000142 48897DB8             mov      gword ptr [rbp-48H], rdi
                        ;; bbWeight=1    PerfScore 4.00
G_M61524_IG09:
000146 488B7DF0             mov      rdi, bword ptr [rbp-10H]
00014A 488B75F8             mov      rsi, qword ptr [rbp-08H]
00014E 488B55E8             mov      rdx, gword ptr [rbp-18H]
000152 8B4DCC               mov      ecx, dword ptr [rbp-34H]
000155 4C8B45C0             mov      r8, gword ptr [rbp-40H]
000159 E85AC0FFFF           call     Internal.Cryptography.Pal.AppleCertificatePal:ImportPkcs12(System.ReadOnlySpan`1[Byte],Microsoft.Win32.SafeHandles.SafePasswordHandle,bool,System.Security.Cryptography.Apple.SafeKeychainHandle):Internal.Cryptography.ICertificatePal
00015E 488945B0             mov      gword ptr [rbp-50H], rax

;; stack slots rbp-70, rbp-40, rbp-48 are untracked

Register slot id for reg rax = 17.
Register slot id for reg rsi = 18.
Register slot id for reg rdi = 19.
Register slot id for reg rdx = 20.
Register slot id for reg r8 = 21.
Register slot id for reg rcx = 22.

Set state of slot 17 at instr offset 0x127 to Live.
Set state of slot 17 at instr offset 0x12d to Dead.
Set state of slot 17 at instr offset 0x132 to Live.
Set state of slot 17 at instr offset 0x136 to Dead.
Set state of slot 19 at instr offset 0x13a to Live.
Set state of slot 19 at instr offset 0x146 to Dead.
Set state of slot 16 at instr offset 0x14a to Live.
Set state of slot 20 at instr offset 0x152 to Live.
Set state of slot 21 at instr offset 0x159 to Live.
Set state of slot 17 at instr offset 0x15e to Live.

Hey @vcsjones -- @wfurt is going to take a look into this this week. Thanks for your help on the investigation; you've given us a great place to continue from. :100:

I'm editing the title to remove "Big Sur" since this repro'd for me on Catalina as well.

@wfurt is going to take a look into this this week.

Thanks a ton. Sorry I've just been underwater with other things lately.

reopening to 5.0. Master is now for 6.0

I was wondering what regressed it, and I guess it was probably the big PFX reader/writer change: https://github.com/dotnet/runtime/commit/af6ab5f364871cb3f5d301d5f9c6b786bccb8a89 https://github.com/dotnet/corefx/pull/42226

Fixed in 5.0.0 as well with #41882.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iCodeWebApps picture iCodeWebApps  路  3Comments

EgorBo picture EgorBo  路  3Comments

omajid picture omajid  路  3Comments

Timovzl picture Timovzl  路  3Comments

bencz picture bencz  路  3Comments