Jupyterhub: pam_start triggers kerberos cache to be created with suffix uid=0

Created on 9 Nov 2016 · 23Comments · Source: jupyterhub/jupyterhub

Mostly a note to self for the morning ...

The pam_start() call in PAMAuthenticator#pre_spawn_start does cause kinit to run on a host configured with Kerberos backed PAM auth. However, because the UID of Hub user at the time this call is made is 0 (root) when using the default spawner, the Kerberos ticket cache ends up being written to a file like so:

-rw------- 1 alice alice 508 Nov  9 07:32 krb5cc_0

where alice is the proper owner, but the suffix on the file references the UID of the root user who triggered the kinit. When alice tries to issue any other kerberos commands (e.g., klist) from in a notebook, no tokens appear to be present because the default cache location is /tmp/krb5cc_%{uid}.

Kerberos does support an effective UID parameter and we could switch the pre_spawn_start to do a os.seteuid. This feels hacky. There's probably a better answer.

reference

Source

parente

Most helpful comment

Good point, and this is turning into a very useful reference, so thanks for recording everything!

I'm mainly thinking about being able to attack this from both sides: It's great that you've figured out how to configure kerberos to work well with JupyterHub's events, but I'd also like to have an answer for people for whom changing their kerberos config to accommodate JupyterHub isn't an option, even if it's just describing what a custom KerberosPAMAuthenticator would need to do, and not necessarily going through the implementation.

minrk on 18 Nov 2016

👍2

All 23 comments

I'm going to see how the sudospawner behaves as well before taking any action on changes in the default spawner.

parente on 9 Nov 2016

@parente I'm also curious to see if you've looked at the systemdspawner and how that interacts with this

yuvipanda on 10 Nov 2016

I have not. I talked to @minrk and @willingc this morning about creating a new repo jupyterhub/jupyterhub-example-kerberos where I plan to push the initial configs for the default and sudospawners. We can build that out with more configs to see how they behave (+ the Spark bit once we have auth and ticketing working correctly at least).

I'll try to seed the repo with what I've got so far this evening.

parente on 10 Nov 2016

With sudospawner and the hub running as a local jupyter user, the result is worse:

-rw------- 1 jupyter jupyter 529 Nov 10 06:10 krb5cc_pam_gJSpJI
-rw------- 1 jupyter jupyter 523 Nov 10 06:12 krb5cc_pam_p353y5

The caches are now owned by the jupyter user instead of the true two users I logged in as, and the UID suffix on the files is now what looks like a temporary pam generated string. It's interesting that the behavior changed at all from the default spawner case since I'm still using PAMAuthenticator.

Getting the example cleaned up and pushed now.

parente on 10 Nov 2016

👍1

I pushed what I have so far to https://github.com/jupyterhub/jupyterhub-example-kerberos with an initial list of goals that I'd like the example to cover in the README. Happy to iterate with anyone interested over there. We can return to this issue or others in the Hub once we have a better grasp on what's really a bug vs a configuration issue vs what requires new auth/spawner behavior.

parente on 10 Nov 2016

👍1

@parente should pam_start be moving to LocalProcessSpawner.start after setuid happens, rather than in the Authenticator?

minrk on 10 Nov 2016

Could be.

I wanted to look into why sudospawner is generating credential caches with what the wrong ownership and a pseudorandom UID suffix first (even with c.PAMAuthenticator.open_sessions = False) before making any changes.

parente on 11 Nov 2016

The pamela.authenticate call appears to be what's triggering the creation of the cred cache, not the open_session.

https://github.com/minrk/pamela/blob/master/pamela.py#L274

parente on 11 Nov 2016

import os
import grp
import pwd
import pamela

username = 'alice'
password = 'alice'
service = 'login'

user = pwd.getpwnam(username)
uid = user.pw_uid
gid = user.pw_gid

os.setgid(gid)
os.setuid(uid)
pamela.authenticate(username, username, resetcred=True)

creates a correctly named and permissioned ccache, but smells funny because of the gid/uid switch BEFORE auth. I started unrolling what pamela.authenticate does to do a start, auth, set cred, open session, end (all in one transaction) based on notes in a couple places about the ccache being improperly constructed when the ops are done out of order or in different transactions (e.g., under caveats here https://www.eyrie.org/~eagle/software/pam-krb5/pam-krb5.html). So far, I've only found that setting uid ahead before pam_set_creds and within the same transaction as the pam_authenticate call works properly.

There's other ways of configuring the ccache kerberos naming based on euid. Maybe PAMAuthenticator can seteuid/setegid before authenticating? That still feels weird. Maybe pamela.authenticate needs to change to intersperse seteuid/setegid into the transaction?

I'll keep experimenting. If anyone has another idea to try, I'm all ears.

parente on 11 Nov 2016

When using the PAMAuthenticator and the LocalProcessSpawner, the kerberos cred cache receives the correct permissions, correct filename, and persists when:

I change the pamela library to use PAM_ESTABLISH_CRED instead of PAM_REINITIALIZE_CRED, and
configure pam_krb5 auth to retain_after_close

I've implemented this over in jupyterhub/jupyterhub-example-kerberos for the time being after trying numerous other krb5, pam, and jupyterhub configurations and getting nowhere with retaining the cred cache on session open rather than on authentication.

I have no idea what impact changing pamela would have on other setups (e.g., AFS?) that might rely on the current parameters and behavior. Making pamela more configurable is probably the right way to go along with subclassing PAMAuthentication in the example repo to patch what's needed until we know more.

Does that approach sound reasonable @minrk ?

parente on 14 Nov 2016

Sounds sensible to me

minrk on 16 Nov 2016

Thanks. I took a timeout but getting back to it soon.

parente on 16 Nov 2016

I think I have a complete accounting of what's going on now. I setup a container with the default pam_krb5 (Kerberos) configuration to auth against a Kerberos Key Distribution Center (KDC) in another container. I modified the authenticate function in pamela.py slightly to print the state of the PAM context. Then I ran the pamela CLI as shown below. I annotated the output with comments of what's happening along the way:

> python pamela.py -a alice

# Within the authentication function, log the PAM environment variable list
# after calling pam_start but before doing anything else to confirm that it is 
# empty. (And it is).
pam_envlist before PAM_AUTHENTICATE: None

# I enter the password for alice
Password:

# Looking at the environment again, I see pam_krb5 stores an environment 
# variable referring to a temporary ticket cache on disk.
pam_envlist after PAM_AUTHENTICATE b'PAM_KRB5CCNAME=/tmp/krb5cc_pam_osqbIP'

# Next I call pam_setcred with the PAM_ESTABLISH_CRED flag which fixes the 
# UID naming problem noted in a comment above. The call also moves the 
# temporary ticket cache to a  permanent location identified by alice's uid=1000
# and a random suffix. It also sets a different env var to point to it and unsets
# the first env var 
pam_envlist after PAM_SETCRED with PAM_ESTABLISH_CRED b'KRB5CCNAME=FILE:/tmp/krb5cc_1000_qixILg'

# I do an os.listdir('/tmp') just to confirm the cache exists on disk
/tmp before PAM_OPEN_SESSION ['krb5cc_1000_qixILg']

# Now I call PAM_OPEN_SESSION using the same handle I used in 
# authenticate and setcred
Last login: Thu Nov 17 04:01:15 UTC 2016
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.20-moby x86_64)

 * Documentation:  https://help.ubuntu.com/

# After opening the session, I do another listdir on /tmp to make sure the 
# ticket cache is still there.
/tmp after PAM_OPEN_SESSION ['krb5cc_1000_qixILg']

# Now here's the rub: When pam_end returns and I do another os.listdir('/tmp'), 
# the ticket cache  is gone!
/tmp after pam_end: []

As soon as the PAM transaction ends, all kerberos state is lost, both the in-memory environment vars set by authenticate and read by setcred and open_session, and the on-disk ticketing cache. This behavior makes sense considering the typical use case is for the parent to spawn a shell after opening the session and only close the session and end the transaction after the shell exits (https://www.freebsd.org/doc/en_US.ISO8859-1/articles/pam/pam-sample-appl.html).

The PAMAuthenticator currently uses one transaction to do the auth, a different transaction to open a session in pre_spawn, and another to close a session in post_spawn. This means any info stored in the handle during one of the transactions is not available in the others (e.g., the env var pointing to a ticket cache). In addition, some PAM modules default to cleaning up resources when transactions end, like the ticket cache in the example above.

For reference, https://www.eyrie.org/~eagle/software/pam-krb5/pam-krb5.html says;

Normally, the user's ticket cache is destroyed when either pam_end() or pam_close_session() is called by the authenticating application so that ticket caches aren't left behind after the user logs out.

I don't know exactly what impact this should have on the design of the Hub, if any. Keeping PAM transactions open from the time a user logs in until he/she logs out doesn't make a lot of sense in an async world. I do know I can configure Kerberos to work around the lifetime problem with the default spawner already (e.g., force it to retain ticket caches beyond transaction lifetime at a cost of polluting /tmp). I suspect similar tricks can be played with others as needed.

At any rate, I have a better understanding of the behavior I'm seeing with JHub and a default Kerberos setup and mostly wanted to document it here for posterity.

parente on 17 Nov 2016

Awesome, thanks @parente! I'm also not quite sure what changes we should make. Is it okay for check-auth and the spawner session to be separate transactions, but keep a transaction open for the lifetime of the Spawner? That is:

auth: start/authenticate/end
pre_spawn_start: pam_start, open_session
post_spawn_stop: close_session, pam_end

It might require some tweaks to pamela to keep transactions open, but I imagine it should be doable.

minrk on 17 Nov 2016

auth: start/authenticate/end

For pam_krb5 at least, as soon as the transaction that does the auth ends, the env var pointing to the ticket cache is reset and the ticket cache gets cleaned up. Another ticket cache is not regenerated when open_session is called. You run into this case https://github.com/rra/pam-krb5/blob/6a46b475da73de32cc7c22dc8dfe62166087837f/setcred.c#L269.

So you can keep the transaction open through the lifetime of the notebook, but unless it's the same transaction that does the auth, it doesn't help with kerberos cred cache lifetime (at least in the default pam_krb5 configuration).

parente on 17 Nov 2016

Hm, that's tough, because auth and spawn really aren't tied together in that way in JupyterHub (multiple browsers need to be able to login). Perhaps the specific Kerberos flow should warrant a dedicated Authenticator that further restricts the order of events (i.e. forcing login on spawn, etc.) beyond regular PAM. For instance, it could use options_form to require re-entering their password as part of spawning.

minrk on 17 Nov 2016

Maybe, though I am able to achieve the desired effect of getting a proper, persistent cred cache using the existing PAMAuthentication and LocalSpawner over in jupyterhub-examples-kerberos just by configuring pam_krb5 slightly differently. So again, I'm not sure changing JHub or writing a custom auth+spawner is really warranted yet. We'll know more once I get sudospawner working in the example repo as well (plus any others people would like to contribute.)

Really, this issue has turned into a journal plus forum post tracking the discoveries and discussion.

parente on 18 Nov 2016

👍1

Good point, and this is turning into a very useful reference, so thanks for recording everything!

minrk on 18 Nov 2016

👍2

I don't known enough PAM to properly discuss the issue here, but i had a similar problem like this:

For pam_krb5 at least, as soon as the transaction that does the auth ends, the env var pointing to the ticket cache is reset and the ticket cache gets cleaned up. Another ticket cache is not regenerated when open_session is called.

I must use pam_mount to mount the user's HOME. Using the default pamela, pam_mount could not succeed in the session phase because it did not have a valid session. The solution i got a year ago was to clone pamela, open a session before it was closed with pam_end() and extend the PAMAuthenticator to use my package. I guess this is only a band-aid solution to my case (i use an extended class of LocalProcessSpawner).

dsoares on 18 Nov 2016

@dsoares thanks! If there are changes you think should be made to pamela, feel free to open an Issue/PR.

minrk on 18 Nov 2016

The sudospawner is more problematic because the ticket cache first gets created with the permissions of the user owning the hub process (say, jupyter), and then has to be chown'ed to the uid:gid of the user that is authenticating. The chown is done by the setcreds function in pam_krb5, which means the jupyter user needs chown permissions at authentication time via some mechanism (e.g., setuid? cap_chown? ...). With that level of permission, I'm not sure there's much point in using the sudospawner.

Perhaps the specific Kerberos flow should warrant a dedicated Authenticator that further restricts the order of events (i.e. forcing login on spawn, etc.) beyond regular PAM. For instance, it could use options_form to require re-entering their password as part of spawning.

I think this might fix the sudospawner case at a cost of a double login. If it's the specific user executing the authenticate method then the chown on the ccache should work. I'll try to prove that out.

parente on 21 Nov 2016

@parente Do you want to leave this issue open or close it here and iterate on it in the kerberos example repo?