What are best practices for managing SSH keys in a team?

At my company we use LDAP to have a consistent set of accounts across all of the machines and then use a configuration management tool (in our case currently cfengine) to distribute authorized_keys files for each user across all of the servers. The key files themselves are kept (along with other system configuration information) in a git repository so we can see when keys come and go. cfengine also distributes a sudoers file that controls who has access to run what as root on each host, using the users and groups from the LDAP directory.

Password authentication is completely disabled on our production servers, so SSH key auth is mandatory. Policy encourages using a separate key for each laptop/desktop/whatever and using a passphrase on all keys to reduce the impact of a lost/stolen laptop.

We also have a bastion host that is used to access hosts on the production network, allowing us to have very restrictive firewall rules around that network. Most engineers have some special SSH config to make this transparent:

Host prod-*.example.com
     User jsmith
     ForwardAgent yes
     ProxyCommand ssh -q bastion.example.com "nc %h %p"

Adding a new key or removing an old one requires a bit of ceremony in this setup. I’d argue that for adding a new key it’s desirable for it to be an operation that leaves an audit trail and is visible to everyone. However, due to the overhead involved I think people sometimes neglect to remove an old key when it is no longer needed and we have no real way to track that except to clean up when an employee leaves the company. It also creates some additional friction when on-boarding a new engineer, since they need to generate a new key and have it pushed out to all hosts before they can be completely productive.

However the biggest benefit is having a separate username for each user, which makes it easy to do more granular access control when we need it and gives each user an identity that shows up in audit logs, which can be really useful when trying to track a production issue back to a sysadmin action.

It is bothersome under this setup to have automated systems that take action against production hosts, since their “well-known” SSH keys can serve as an alternative access path. So far we’ve just made the user accounts for these automated systems have only the minimal access they need to do their jobs and accepted that a malicious user (who must already be an engineer with production access) could also do those same actions semi-anonymously using the application’s key.

Leave a Comment