Nixpkgs: NFS4 server with Kerberos does not work

Created on 3 Nov 2019  路  17Comments  路  Source: NixOS/nixpkgs

I have a working NFS4 setup with a client and a server who both run NixOS.
I have a working kerberos setup as well on the same machines (nfs server is kerberos kdc).

Now I wanted to enable kerberos authentication for NFS, but could not get it to work. Because I didn't think NixOS was to blame, I submitted a call for help with all the details of my setup to:
https://serverfault.com/questions/989749/how-should-i-proceed-debugging-nfs4kerberos

Today I brought in a debian 10 based laptop and set it up as an NFS4 server as well.
Without kerberos, interactions are great again. Debian and NixOS can interact as clients and servers just fine. With kerberos added, a NixOS client can mount the Debian NFS server!
However the debian client cannot access the NixOS NFS server and runs into the same issue as the NixOS client.

So it appears the problem must lie with NixOS' NFS Server interactions with kerberos.

To Reproduce
Steps to reproduce the behavior:

  1. setup an nfs client and server
  2. setup kerberos
  3. try to mount an nfs share that needs kerberos authentication

Metadata

  • system: "x86_64-linux"
  • host os: Linux 5.3.7, NixOS, 19.09.git.28e5506 (Loris)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.3
  • channels(root): "nixos-19.09beta606.3ba0d9f75cc"
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

I do not usually update the channel but run straight from a git checkout.
Currently based on channel of October 21 2019

Maintainer information:

# a list of nixos modules affected by the problem
module:
- services/network-filesystems/nfsd.nix
bug

All 17 comments

Can you tell me the configuration you used? I've never set up kerberos but I'll give it a go.
It we can make it work NFSv4 + kerberos would make for a good NixOS test.

Sorry for the late reply!

both machines have:

    networking.hosts."127.0.1.1" = [ "${config.networking.hostName}.lan" ];

    krb5 = {
        enable = true;
        domain_realm.".lan" = "LAN";
        libdefaults.default_realm = "LAN";
        realms."LAN" = {
            admin_server = "bluebox.lan";
            kdc = "bluebox.lan";
        };
    };

bluebox (the server):

    networking = {
        hostName = "bluebox";
        firewall = {
            enable = true;
            allowedTCPPorts = [
                                2049 # nfs
                                88 # kerberos
                                749 # kerberos admin
                              ];
      };
    };
    services = {
        kerberos_server = {
            enable = true;
            realms."LAN" = {};
        };
        nfs.server = {
            enable = true;
            exports = ''
                /mnt bluescreen.lan(rw,no_root_squash,crossmnt,sec=krb5) 
            '';
        };
    };

bluescreen (the client):

    fileSystems."/bluebox" = {
        device = "bluebox.lan:/mnt";
        fsType = "nfs";
        options = [ "x-systemd.automount" "noauto" "nfsvers=4.2" "sec=krb5"];
    };

Other setup (which is manual / imperative) is kerberos itself. Described here

Everything works fine if I remove sec=krb5, so it's clear the issue lies in there.
Client side seems fine as well, because the same thing (including sec=krb5) works when I point the filesystem to a debian-based machine. So it must be something on the nfsd (server) side on NixOS.

So, I studied manuals a bit and tried to reproduce your setup. This is as far as I got:

import <nixpkgs/nixos/tests/make-test.nix>

({pkgs, lib, ...}:

with lib;

let
  krb5 = 
    { enable = true;
      domain_realm."nfs.test"   = "NFS.TEST";
      libdefaults.default_realm = "NFS.TEST";
      realms."NFS.TEST" =
        { admin_server = "server.nfs.test";
          kdc = "server.nfs.test";
        };
    };

  hosts =
    ''
      192.168.1.1 client.nfs.test
      192.168.1.2 server.nfs.test
    '';

in

{
  name = "nfsv4-with-kerberos";

  nodes = {

    client = { lib, ... }:
      { inherit krb5;

        networking.extraHosts = hosts;

        fileSystems = lib.mkVMOverride
          { "/data" = {
              device  = "server.nfs.test:/";
              fsType  = "nfs";
              options = [ "nfsvers=4" "sec=krb5p" ];
            };
          };
      };

    server = { lib, ...}:
      { inherit krb5;

        networking.extraHosts = hosts;

        networking.firewall.enable = false;
        networking.firewall.allowedTCPPorts = [
          2049 # nfs
          88   # kerberos
          749  # kerberos admin
        ];

        services.kerberos_server.enable = true;
        services.kerberos_server.realms =
          { "NFS.TEST".acl = 
            [ 
              { access = "all"; principal = "nfs/server.nfs.test"; }
              { access = "all"; principal = "nfs/client.nfs.test"; }
              { access = "all"; principal = "admin/admin"; }
            ];
          };

        services.nfs.server.enable = true;
        services.nfs.server.createMountPoints = true;
        services.nfs.server.exports =
          ''
            /data client(rw,no_root_squash,fsid=0,sec=krb5p)
          '';
      };

  };

  testScript =
    ''
      # set up kerberos database
      $server->succeed("kdb5_util create -s -r NFS.TEST -P supersecretpassword");
      $server->succeed("systemctl restart kadmind.service kdc.service");
      $server->waitForUnit("kadmind.service");
      $server->waitForUnit("kdc.service");

      # add principals
      $server->succeed("kadmin.local add_principal -pw hunter2 nfs/server.nfs.test");
      $server->succeed("kadmin.local add_principal -pw hunter2 nfs/client.nfs.test");
      $server->succeed("kadmin.local add_principal -pw admin   admin/admin");

      # check on the nfs server
      $server->waitForUnit("nfs-server");
      $server->succeed("systemctl start network-online.target");
      $server->waitForUnit("network-online.target");

      # start client
      $client->succeed("systemctl start network-online.target");
      $client->waitForUnit("network-online.target");

      # test kerberos
      $client->succeed("echo hunter2 | kinit nfs/server.nfs.test");
      $client->succeed("echo hunter2 | kinit nfs/client.nfs.test");

      # test nfs mount
      $client->waitForUnit("data.mount");
    '';
})

Ticketing and unauthenticated NFS seems ok but I too can't get the nfs client to mount the export.
I'm sure I'm doing something wrong but with the good ol' unix style error reporting it's not easy to find out what.

The error (which doesn't make any sense to me) coming from mount.nfs is

mount.nfs: an incorrect mount option was specified

and seems caused by passing -o sec=krb5, or any other method really.

I'm pinging people who contributed kerberos support for NixOS who may know something more.

@eqyiel, @kwohlfahrt

Hi @rnhmjoj
Thanks for trying to replicate this.

Note1: you don't need to add these nfs principals to the ACL. ACL is only used for principals that want to change the kerberos database itself. admin/admin is useful, although default, so the entire block can be removed (changed to realms."NFS.TEST" = {} which declares the realm but keeps everything as default.

Note2: It seems you are missing steps to add the principals to each machine's keytab:

# add to keytabs (below add principals
$server->succeed("kadmin.local ktadd nfs/server.nfs.test");
$client->succeed("echo hunter2 | kadmin -p admin/admin ktadd nfs/client.nfs.test");

of course this might become a tricky race condition because when client starts, it might try to mount before the kerberos is in a useable state. perhaps adding noauto to the mount point and manually invoking mount /data from your test script will help.

Note3: you normally don't use passwords for non-human principals but rather use add_principal -randkey, but I can see how this is useful in this context for the test section.

I learned something new because of your example: fsid=0 is very useful to remove the root path.

Note1: you don't need to add these nfs principals to the ACL. ACL is only used for principals that want to change the kerberos database itself. admin/admin is useful, although default, so the entire block can be removed (changed to realms."NFS.TEST" = {} which declares the realm but keeps everything as default.

This explains a lot of things. I did really get the point of the ACLs, thank you.

of course this might become a tricky race condition because when client starts, it might try to mount before the kerberos is in a useable state. perhaps adding noauto to the mount point and manually invoking mount /data from your test script will help.

Yeah, I learned about this a little too late.

Note3: you normally don't use passwords for non-human principals but rather use add_principal -randkey, but I can see how this is useful in this context for the test section.

You're right but doesn't that requires to copy the keytab over to the client? It's kind of tricky to move files between hosts in the simulation...

Don't mind my last sentence: I realised you can copy it using kadmin. So, here's the updated version.

import <nixpkgs/nixos/tests/make-test.nix>

({pkgs, lib, ...}:

with lib;

let
  krb5 = 
    { enable = true;
      domain_realm."nfs.test"   = "NFS.TEST";
      libdefaults.default_realm = "NFS.TEST";
      realms."NFS.TEST" =
        { admin_server = "server.nfs.test";
          kdc = "server.nfs.test";
        };
    };

  hosts =
    ''
      192.168.1.1 client.nfs.test
      192.168.1.2 server.nfs.test
    '';

in

{
  name = "nfsv4-with-kerberos";

  nodes = {

    client = { lib, ... }:
      { inherit krb5;

        services.xserver.xkbOptions = "ctrl:swapcaps";
        i18n.consoleUseXkbConfig    = true;

        networking.extraHosts = hosts;

        fileSystems = lib.mkVMOverride
          { "/data" = {
              device  = "server.nfs.test:/";
              fsType  = "nfs";
              options = [ "nfsvers=4" "sec=krb5p" "noauto" ];
            };
          };
      };

    server = { lib, ...}:
      { inherit krb5;

        services.xserver.xkbOptions = "ctrl:swapcaps";
        i18n.consoleUseXkbConfig    = true;

        networking.extraHosts = hosts;

        networking.firewall.allowedTCPPorts = [
          2049 # nfs
          88   # kerberos
          749  # kerberos admin
        ];

        services.kerberos_server.enable = true;
        services.kerberos_server.realms =
          { "NFS.TEST".acl =
            [ { access = "all"; principal = "admin/admin"; } ];
          };

        services.nfs.server.enable = true;
        services.nfs.server.createMountPoints = true;
        services.nfs.server.exports =
          ''
            /data client(rw,no_root_squash,fsid=0,sec=krb5p)
          '';
      };

  };

  testScript =
    ''
      # set up kerberos database
      $server->succeed("kdb5_util create -s -r NFS.TEST -P supersecretpassword");
      $server->succeed("systemctl restart kadmind.service kdc.service");
      $server->waitForUnit("kadmind.service");
      $server->waitForUnit("kdc.service");

      # create principals
      $server->succeed("kadmin.local add_principal -randkey nfs/server.nfs.test");
      $server->succeed("kadmin.local add_principal -randkey nfs/client.nfs.test");
      $server->succeed("kadmin.local add_principal -pw admin admin/admin");

      # add principals to server keytab
      $server->succeed("kadmin.local ktadd nfs/server.nfs.test");
      $server->succeed("klist -k | grep nfs/server");

      # start client
      $client->succeed("systemctl start network-online.target");
      $client->waitForUnit("network-online.target");

      # add principals to server keytab
      $client->succeed("echo admin | kadmin -p admin/admin ktadd nfs/client.nfs.test");
      $client->succeed("klist -k | grep nfs/client");

      # test nfs mount
      $client->succeed("systemctl restart data.mount");
      $client->waitForUnit("data.mount");
    '';
})

There error is stil the same:

client# [  103.697996] mount[748]: mount.nfs: an incorrect mount option was specified

I think rpc-gssd.service is needed for Kerberos/NFS mounts. This doesn't quite fix the above tests, but at least gets a different error.

From what I remember from Debian, rpc-svcgssd is also required, but that doesn't seem to be packaged in NixOS. That might have been only for NFSv3 though that's rpc.mountd, so it looks like we need to package rpc-svcgssd, I'm on it :)

Progress - the filesystem mounts, but no writes are possible, even when I have kinit the correct user. I've posted my work so far in #73989, and will have another look tomorrow - advice welcome.

@kwohlfahrt Thank you! I can't help you much with kerberos but I'll try reviewing you PR.

Not much progress today - I suspect something to do with ID mapping since everything is owned by nobody. Many packages suggest idmapd must be running on the client _and_ server, but more recent documentation suggests nfsidmap is necessary on the client, not idmapd. Starting idmapd on the client (requires enabling services.nfs) makes the UIDs correct, but write permission is still denied.

I suspect the missing factor is that nfs-utils does not pull in external libnfsidmap, which the read-me suggests is necessary (and old versions cause a similar bug to what we see). I think libnfsidmap either recently split or merged from nfs-utils, and I'm not sure which way round it went libnfsidmap was merged into nfs-utils in 2017, which is after the last release of libnfsidmap in 2014. So that shouldn't be the issue, but Debian still uses it for some reason, as well as an old version of nfs-utils (1.3.4)?

I think I was barking up the wrong tree. The example in the PR is now working, with two caveats:

  1. file IDs are all messed up. This can be fixed by running nfs-idmapd on the client, but docs suggest that this is no longer necessary as it is replaced by the nfsidmap binary. I'm going to try on a few different distros to see what they do.
  2. It is not possible to write to files as root, even if the corresponding kerberos principal exists. This may be intended behavior.

I think the first issue is a dupe of #68106.

I have Kerberos + NFS working in the PR linked above. There is still a bit of sorting out regarding how the module should be structured so we can make this easy to use, but it is nearly done.

@kwohlfahrt and @rnhmjoj thank you so much!
I will try out this PR this weekend, but it seems a lot has been achieved :)

With #73989 merged I guess this should be closed.

yes, seems to work fine for me!

thanks a lot, I would have never figured out these underlying issues myself

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vaibhavsagar picture vaibhavsagar  路  3Comments

lverns picture lverns  路  3Comments

ob7 picture ob7  路  3Comments

grahamc picture grahamc  路  3Comments

langston-barrett picture langston-barrett  路  3Comments