IPA and Hardware Security Modules (HSMs)

Introduction

A hardware security module (HSM) is a device that provides physical and tamper-proof security for sensitive data. It is used to provision cryptographic keys. These keys cannot leave the device. They generally communicate using the PKCS#11 standard.

Because they use PKCS#11 it is possible to simulate them in software. One way is using the SoftHSM2 library. This does not provide the same level of protection as a hardware device but it is useful for development and prototyping.

An effort was made in 2019 with conjuction with some interested users to implement support for HSMs in IPA using SoftHSM2. The underlying CA, dogtagpki, supports several HSMs so much of the heavy lifting is already done. This effort brought IPA about 90% of the way towards supporting HSMs but some blocking bugs prevented full support. A user provided instructions on doing a single server installation, https://magnus-k-karlsson.blogspot.com/2019/08/installing-dogtag-on-fedora-30-with.html

This effort was reconstituted in the summer of 2022 to build on the work done in 2019.

NSS and tokens

Before we can dive into implementation we need to start with some terminology. Taken from the PKCS#11 spec:

Slot: A logical reader that potentially contains a token.

Token: The logical view of a cryptographic device defined by Cryptoki.

I like to think about these literally. Think of an ATM. It has a slot. The token is your card. On a real HSM a token is probably not removable. It’s harder to steal an entire device than a small card. But internally one or more slots is exposed containing one or more tokens.

Each slot and each token will have a unique name. The token name will be referenced all throughout this document.

NSS was built around the PKCS#11 specification so that all certificates and keys are stored in a slot. By default it is the internal NSS slot, the so-called soft token ( also called softokn). You don’t have to specify this slot when accessing keys and certificates on the soft token. It is implicit.

Any other PKCS#11 module will require you to specify the token it’s on and provide a PIN to access it.

Access to these slots and keys using PKCS#11 is provided by a shared library. So you need this library installed and the token to use initialized with a PIN before you can even start.

Installing

Installation is still a bit rough. These instructions are use unreleased code in my private repository. It is not production ready. There are only builds for Fedora 36.

The design of how it will be implemented in the IPA installers is under review. The below instructions use a workaround created by the 2019 team. This is not a long-term solution as it is too easy to mess up IMHO.

These instructions are going to revolve around using the SoftHSM2 library with software tokens. Using a hardware HSM instead should require just replacing the shared library and token name(s).

Start by enabling my custom repo with the necessary builds.

# dnf copr enable rcritten/freeipa

# dnf -y install freeipa-server-dns

Next we need to do some system setup so that the PKI subsystem can access the softhsm token. This is from Granting Permission to PKI System User:

# usermod pkiuser -a -G ods

Next we prepare our token. I’ve named it softhsm_token. You can name it whatever you’d like. The reason for runuser is so the token files are readable and owned by pkiuser.

# softhsm2-util --delete-token --token softhsm_token

# runuser -u pkiuser -- /usr/bin/softhsm2-util --init-token --free --pin password --so-pin password --label "softhsm_token"

Create the ini file to pass into the IPA installer.

[DEFAULT]

pki_hsm_enable=True

pki_hsm_libfile=/usr/lib64/pkcs11/libsofthsm2.so

pki_hsm_modulename=softhsm2

pki_token_name=softhsm_token

pki_token_password=password

pki_sslserver_token=internal

Install IPA with this HSM

# hostname ipa.example.test

# ipa-server-install -a password -p dmpassword -r EXAMPLE.TEST -U --setup-dns --allow-zone-overlap --no-forwarders -N --auto-reverse --pki-config-override=/root/pki.ini

Assuming everything went as planned you should now have an IPA installation that uses an HSM as its CA key storage.

Implementation Notes

The CA/pki-tomcat Server-Cert is stored in the database. This only serves requests so doesn’t need the security of the HSM. It also doesn’t work if you try to force it into the HSM because at some point it is exported as a PKCS#12 which won’t work because you can’t extract private key material from an HSM.

certutil output is going to look a bit strange.

# certutil -L -d /etc/pki/pki-tomcat/alias

Certificate Nickname Trust Attributes
SSL,S/MIME,JAR/XPI

caSigningCert cert-pki-ca CT,C,C
ocspSigningCert cert-pki-ca ,,
Server-Cert cert-pki-ca u,u,u
subsystemCert cert-pki-ca ,,
auditSigningCert cert-pki-ca ,,P

Where are the u’s you might ask? The u trust attribute indicates the presence of the private key. Since the key is on the HSM, it doesn’t show (except for the for-mentioned Server-Cert.

That these show at all is a side-effect of the way NSS handles trust flags. There is no PKCS#11 equivalent so they are stored in the NSS soft token instead.

Looking at the HSM token:

# certutil -L -d /etc/pki/pki-tomcat/alias -h softhsm_token

Certificate Nickname Trust Attributes
SSL,S/MIME,JAR/XPI

Enter Password or Pin for "softhsm_token":
softhsm_token:ocspSigningCert cert-pki-ca u,u,u
softhsm_token:caSigningCert cert-pki-ca CTu,Cu,Cu
softhsm_token:subsystemCert cert-pki-ca u,u,u
softhsm_token:auditSigningCert cert-pki-ca u,u,Pu

Here NSS melds the private key existence with the trust attributes from the soft token to give a more typical view of things.

You can’t mix tokens. Only one token per server is allowed. Technically PKI could use a different one for each key but I haven’t tested that and preventing regressions would likely be a nightmare. If you need it “d need a pretty strong use-case.

My personal repo

You might wonder what unreleased patches lurk in my unreleasted repo. Here is a general overview.

freeipa

IPA got some understanding of tokens in the 2019 effort but it was not complete. The certmonger tracking had to be updated to store the token. Similarly certmonger needs to know the user to run as so that when a certificate is issued the ownership is correct in softhsm.

When using an HSM we can’t use Custodia to retrieve keys for replica install.

I added some extended certificate handle so that freeipa-healthcheck can validate the installation.

certmonger

When using SoftHSM2 any files created need to be owned by the pkiuser otherwise the CA will not be able to read them. certmonger runs as root. As a workaround when saving certificate data it will do a setuid/setgid so that permissions are correct.

freeipa-healthcheck

healthcheck had no understanding of tokens at all. It gets the list of expected certificates from IPA. That exchange needed to be extended so that both the soft token and the HSM token were retrieved so that tracking and certificate trust attributes are as expected. This is not yet in my repo but I have a PR upstream.

NSS

When modifying certificate trust certutil needs to authenticate to the token holding the certificate. The problem is that the trust is stored in the soft token so that also needs to be authenticated. Have it fall back to the soft token when adding/modify trust.

pki

The installer code needed a fair bit of work due to permissions of the SoftHSM token files. A lot of calls need to be wrapped in runuser so that the resulting file permissions are ok. It is TBD whether this is problematic in a physical HSM installation. Most of these changes are already merged. If you want a detailed look just look at any recent merges by me.

Replica installation

I need to get a patch merged with the CA in order for replica installation to work. I’ll either update this post or create a new one when that is done.

KRA installation

Installing a KRA during intial install, --setup-kra, or afterward, ipa-kra-install, works for me.

Advertisement

ipa-healthcheck combined configuration and command-line options

Introduction

ipa-healthcheck runs automatically daily by default with some default options like where the log file is written and the output format. We had a request that this be easily configurable without having to modify the systemd services directly. The request was to add the command-line (CLI) options to the configuration file.

This was done upstream in the 0.11 release. There are a few gotchas that one needs to be aware of.

The first option used wins. A merge happens between the configuration file and the command-line, with the configuration file loaded first. So the command-line will not override the config.

To specify options that contain a dash (-) replace it with an underscore (_) in the file.

For example, --output-type=human becomes output_type=human in the configuration file.

Options that make no sense in configuration

I’m a great believer into giving users lots of flexibility. A number of options are allowed, but make no sense stored in a configuration file. This includes the --source and --check options. While sure, you can add them, they won’t do a lot of good since they are limited to a single source and a single check and there is zero validation that the source/check are valid. Doesn’t do a lot. This may be addressed in the future but it is what it is for now.

--list-sources and --input-file are others that makes no sense. Yes, you can set them, but it’ll do nothing useful. I may eventually add a “not-for-config” flag or something but that’s a nice-to-have, not a must-have.

--debug and --verbose are equally not useful as the output will be on stderr and will be suppressed by systemd when run in automation. Otherwise you’re looking at a firehose of output.

Options that do make sense

The original intention was to make --output-type and –-output-file configurable, I just went a bit overboard. In for a penny, in for a pound. The idea is that you set these options in /etc/ipahealthcheck/ipahealthcheck.conf and when automation runs it honors the requested formatting and output location. Some users really prefer the human output format.

Configuration file(s)

Per the requested use-case, if you aren’t running from the command-line much then go ahead and update the default configuration as needed. There is a new option, --config, that lets one pass in a different set of defaults. This can be helpful if you want control when running manually. You may want a different output type, for example.

# ipa-healthcheck --config /etc/ipahealthcheck/manual.conf

Expired LDAP password handling

In order that only an end user knows their password, whenever a password is administratively set, it is marked as expired. This is the nexus of a long standing request to deny LDAP binds for expired passwords.

This is a bit of a chicken-and-egg problem. All password changes eventually pass through LDAP so if it denies a BIND on expired password there is no way to reset it.

This is not to say that without denying expired LDAP passwords that the server is completely open to brute-force attacks. There are password policies that can limit the number of failed authentications and lock the account for a time period, or permanently.

But we recognized that it is unexpected behavior. Several attempts were made over the years to invent a mechanism to allow some operations for expired passwords and denied others but we would eventually run into corner cases and the changes were abandoned.

A new approach was taken based on an expired LDAP draft, https://tools.ietf.org/id/draft-behera-ldap-password-policy-10.html

This draft includes a proposal to limit the number of LDAP authentications based on a maximum number, a time limit, or both. I chose to implement a count-based approach which we’re calling “grace limit.”

The basic idea is that a password policy can contain a maximum count for use of the LDAP password. This can vary by policy and by default is -1, which is disabled in order to maintain backwards compatibility. A value of 0 disables all grace and any LDAP bind with an expired password is immediately denied.

For values above 0 that many authentications are allow for which any and all operations are allowed: searches, adds, deletes, modifies, etc per the permissions that the user has. Once the number of BIND has exceeded the grace limit the user is no longer to BIND.

To set a grace limit in a password policy on the command-line (not yet supported in the web UI):

pwpolicy-mod --gracelimit=[-1 to MAXINT]

A password policy control, if requested, is returned including the remaining number of BIND attempts. It will look something like this when the grace limit is set to 5 on the first BIND attempt:

$ ldapsearch -LLL -x -D 'uid=tuser,cn=users,cn=accounts,dc=example,dc=test' -W -e ppolicy -b uid=tuser,cn=users,cn=accounts,dc=example,dc=test dn
# PasswordExpired control
ldap_bind: Success (0) (Password expired, 4 grace logins remain)
dn: uid=tuser,cn=users,cn=accounts,dc=example,dc=test

Any password change on the account, by an administrator or the user will reset the grace period count back to 0.

In summary, to restrict LDAP binds post expiration the password policy needs to be updated to include a grace limit. The possible values are:

Grace Period ValueDescription
-1Grace limit handling is disabled (default)
0All LDAP BIND on expired passwords are denied
1-MAXINTThe number of LDAP binds allowed post-expiration

Random Serial Numbers for the IPA CA

Allowing Random Serial Numbers (RSN) within IPA has been requested by users for a decade. One of the major reasons for it being initially requested is handling the trusted CA problem. It goes like this:

You install IPA, have your browser trust its CA and play for a while. Then you want to get serious and do it for real from scratch so you do an uninstall then re-install. Trying to load the new CA into the browser can be a pain resulting in a message like “You are attempting to import a certificate with the same issuer and serial number.” Super annoying.

The reason is a stock CA installation begins with serial number 0 and allocates along a range in order with a subject based on the REALM. A re-install will begin again at 0 and if the same REALM is used, instant conflict.

Ranges are allocated using the 389-ds Distributed Numeric Assignment plugin. It starts with a range and if a replica (clone) is created the range is split between the two. And so on as more replicas are added. This is largely invisible to IPA. It treats the CA post-installation more or less as a black box.

The CA software that IPA uses, dogtagpki, recently gained support for a new RSN schema, version 3, in dogtagpki version 11.2.0 (as of this writing still in pre-release).

IPA 4.9.10 adds RSNv3 support if the version if the installed PKI version installs it.

There is a pretty big catch though: only new installs are supported. It is not possible to upgrade an existing ranged IPA installation to an RSNv3 installation. The reason for this is there is not yet a good way to handle conversion from RSN back to static ranges. If for some reason you found that RSN isn’t working for you, you’re stuck with it.

And before you ask, no, there is not yet an easy way to migrate all of IPA from one server to another beyond users and groups. It’s being worked on but there is no ETA. Ideally this will easily allow substituting a new CA with the existing one. But again, there could be dragons because how there is a CA with potentially the same name but a different signing key and you could be back at square one. We’ll work hard to avoid that.

It’s important to note that request and KRA ids are also randomized in a RSNv3 installation. It probably won’t affect you operationally but the ids and serial numbers can be huge, 128-bits (~40 digits). Not all software can handle them.

The next important step is allowing for pruning which will make technologies like ACME a lot easier to manage. Currently there is no supported way that I know of to clean up a certificate database of expired certificates. With RSNv3 pruning of them should be possible.

Change in Firefox related to host names and its impact on IPA

Heads up about a change in Firefox v101.0 that can affect some deployments of freeIPA.

https://www.mozilla.org/en-US/firefox/101.0/releasenotes reads:

“Removed “subject common name” fallback support from certificate
validation. This fallback mode was previously enabled only for manually
installed certificates. The CA Browser Forum Baseline Requirements have
required the presence of the “subjectAltName” extension since 2012, and
use of the subject common name was deprecated in RFC 2818.”

This has been a long time coming. RFC2818 contains this:

https://datatracker.ietf.org/doc/html/rfc2818#section-3.1

If a subjectAltName extension of type dNSName is present, that MUST
be used as the identity. Otherwise, the (most specific) Common Name
field in the Subject field of the certificate MUST be used. Although
the use of the Common Name is existing practice, it is deprecated and
Certification Authorities are encouraged to use the dNSName instead.

It is probably a safe assumption that other browsers will soon follow suit.

If you don’t use the IPA CA then you need to verify that the
certificates, from Let’s Encrypt for example, contain a DNS Subject
Alternative Name (SAN) (LE should already). If not then you need to work with the provider(s) to reissue new ones.

Installations with an IPA CA has enabled a DNS SAN for the Apache and
389 certificates since 4.5.1 so newer deployments should be unaffected
by this.

To confirm that the current IPA-issued certificates, including an IPA CA
signed as a subordinate by an external CA, contain a SAN:

For IPA 4.6 and earlier:

# getcert list -d /etc/httpd/alias -n Server-Cert

# getcert list -d /etc/dirsrv/slapd- -n Server-Cert

For IPA 4.7 and later:

# getcert list -f /var/lib/ipa/certs/httpd.crt

# getcert list -d /etc/dirsrv/slapd- -n Server-Cert

Included in the output for each cert should be a line like:

dns: ipa.example.test

Where ipa.example.test is the hostname of the machine.

If it isn’t you can use certmonger to add a DNS SAN and reissue an
existing certificate with:

# getcert resubmit -i -D $(hostname)

If you aren’t using an IPA CA then it is still possible to verify but it
is slightly more complicated because the certificate nickname(s) may be
different.

For IPA 4.6 and earlier:

# grep NSSNickname /etc/httpd/conf.d/nss.conf

# certutil -L -d /etc/httpd/alias -n “<value from above>”

# grep nsSSLPersonalitySSL /etc/dirsrv/slapd-REALM/dse.ldif

# certutil -L -d /etc/dirsrv/slapd-REALM -n “<value from above>”

The output for each should contain something like:

Name: Certificate Subject Alt Name
DNS name: "ipa.example.test"

Where ipa.example.test is the hostname of the machine.

For IPA 4.7 and later:

# grep SSLCertificateFile /etc/httpd/conf.d/ssl.conf

# openssl x509 -noout -text -in "<value from above"

The output should contain something like:

X509v3 Subject Alternative Name:
    DNS:ipa.example.test

# grep nsSSLPersonalitySSL /etc/dirsrv/slapd-REALM/dse.ldif

# certutil -L -d /etc/dirsrv/slapd-REALM -n "<value from above>"

The output for each should contain something like:

Name: Certificate Subject Alt Name
DNS name: "ipa.example.test"

Where ipa.example.test is the hostname of the machine.

If not you’ll need to contact the issuing CA to get a replacement with a
DNS SAN.

Overview of certificate issuance in IPA

Certificate issuance in IPA can seem complicated so here is a basic overview of how it works and what is allowed. This article is mostly about issuing host and service certificates .

There are a few basic rules:

  1. IPA will only issue certificates to entities it owns and can verify, including Subject Alternative Names (SAN).
  2. A host in IPA can issue certificates for itself and the services running on it.
  3. IPA allows certificate delegation so that one entity can issue certificate(s) for another.
  4. Certificates are revoked when the host or service is removed.
  5. certmonger always uses the host principal when requesting a certificate (ok not a rule but this is frequently overlooked).

So let’s break these down one at a time.

IPA will only issue certificates to entities it owns and can verify

There is nothing in the CA itself to prevent issuance of any certificate subject or SAN, for the most part. IPA layers on additional access control to prevent someone from issuing certificates outside of known objects. This is a layer of protection to prevent bad actors from issuing certificates for any name, which the backend CA itself will generally allow with the right credentials.

This includes any IPAddr SAN. The address must point to a hostname that the requestor is allowed to issue a certificate for. No bare IPAddr are allowed.

The mail SAN type is not allowed for host and service certificates. Only for user certificates.

IPA requires a subject in the CSR that has the form at least cn=<hostname of target host>

A host in IPA can issue certificates for itself and the services running on it.


A host can request a certificate for itself, represented as the “host” service principal. The certificate is stored in the host entry itself.

A host can also request certificates for any service principals associated with the host (see rule #1). So for example, you enroll a client and install Apache on it, you can create a HTTP service principal and the host will be allowed to request a certificate for that service.

This is related to rule #4. A very convenient way to request IPA certificates is to use the ipa-getcert command provided by certmonger. certmonger always authenticates to IPA using the host principal in /etc/krb5.keytab, regardless of the principal for the user executing the command.

IPA allows certificate delegation so that one entity can issue certificate(s) for another.

It is also possible to allow host A to request certificates for services on host B. In IPA parlance “host A manages service B”.

To delegate a host to issue a certificate for another host:

ipa host-add-managedby --host hostB hostA

To delegate a host to issue a web service principal certificate:

ipa service-add-host HTTP/hostB --hosts hostA

The difference here is in the storage. A host stores its own Kerberos principal (host/hostname) and certificates in the host entry. A service has a separate entry for each service principal and the certificate is stored there.

Certificates are revoked when the host or service is removed.

If a certificate is attached to an object then part of the removal process of that object is to revoke it if it was issued by the IPA CA. This is mandatory and even –force will not work around it. Trying to remove a host or service is often the canary in the coal mine where users discover that the CA is not working as it should.

IPA cannot ensure that the private key of the certificate is properly cleaned up so the best it can do is invalidate it so that a user using OCSP or a CRL will see the it is no longer valid.

certmonger always uses the host principal when requesting a certificate

As mentioned earlier, certmonger always uses the host principal when requesting certificates. This is why host delegation is so important and is also the source of most of the confusion. Each host and service must be delegated separately, providing for fine-grained control over what a host is allowed to do to another host or service.

Simple 389-ds plugin logging in ipa

IPA created a number of 389-ds plugins to do various things like enforce failed login counters, handle password sync, etc.

389-ds provides quite a robust debug logging and tracing system but if you just want to see your plugin messages without tweaking the 389-ds error log level and spamming the log with things you don’t care about, a simple and silly way to crank up the logging is to replace the IPA logging macros LOG()with LOG_FATAL() in the plugin you’re investigating.

You’ll be limited to just that plugin/file but FATAL messages always log so you’ll see them immediately without all the other cruft that may not be interesting.

This does require a re-compile and you need to take care to undo the changes before submitting any changes, but it is a dead simple change and can make troubleshooting some problems a lot simpler.

Installing the updated plugin is straightforward, copy it to /usr/lib64/dirsrv/plugins/ and restart dirsrv.target.

RUNNING MIXED VERSIONS OF IPA

When migrating between major versions of IPA or an operating system, say RHEL 7 to RHEL 8, this is generally done by creating a new master using the latest version of IPA on the latest distro release. Then slowly migrating the old to new, eventually ending up with all new.

We often get asked what the downside of moving slowly is. Generally we give an answer like “objects created in the newer server(s) should still work fine on the older ones, but new objects created in the older servers will lack new features.”

Here is a specific answer.

I’ve been looking at extending password policy. This is going to require extending the schema in some way with new attributes to hold the new configuration values. This isn’t a problem with mixing old and new versions as the schema is replicated to all servers.

But what would be missing would be policy enforcement! For argument’s sake let’s say the new policy has cracklib integration, so passwords are checked against the dictionary among other checks.

What this means in practice is that only those passwords changed on the newer servers with this integration will actually have the policy applied. Not good.

The moral of the story is: yes, there is a window of opportunity for these types of issues in the middle of a migration between versions. Understand it and plan around it. If it is unacceptable then migrate faster. If the risk is acceptable, migrate more slowly. Either way still only migrate one server at a time to avoid replication issues as the new changes get rolled out.

Getting the cert and chain in one file in certmonger

Some servers want the server cert and CA chain all in one file. There isn’t an option in certmonger to do this but it can be completed using the post-save command. This is a command specified in the request that executes after a certificate has been issued and saved to disk.

The option does not accept bash syntax. It executes a single command. Generally speaking for complex operations your best bet is to put it into a separate bash script that is executed, which we’ll do here.

I created /usr/local/bin/catcerts.sh with the contents:

#!/bin/bash
#
# concatenate a server cert and the chain into a single file

cert=$1
chain=$2
target=$3

cat $cert $chain > $target

IMPORTANT: Add your own error checking.

Use certmonger to request a cert with this as the command:

ipa-getcert request -f /etc/pki/tls/certs/test.pem \
-k /etc/pki/tls/private/test.key \
-C "/usr/local/bin/catcerts.sh /etc/pki/tls/certs/test.pem /etc/ipa/ca.crt /etc/pki/tls/certs/whole.pem"

This is an example on an IPA-enrolled machine where the chain already exists in /etc/ipa/ca.crt. If you need the chain as well you can add -F /etc/pki/tls/certs/chain.pem and use that in the concatenation.

Is my IPA install ok?

This is a common and complex question.

IPA is not a single daemon. It is a collection of services configured to work together. It provides a great deal of customization, and therefore a lot of room for mis-configuration post-installation. Some changes can be made that won’t be generate immediately visible problems which can make diagnosis difficult and it is particularly difficult when a new administrator takes over a running system and lacks the background of the installation.

IPA puts a great deal of effort prior to installation to verify that a system is properly configured and ready to accept an IPA master installation. What it is missing is a way post-installation to ensure that the system, or a set of masters, is still running as expected.

One method to check this is the ipa-consistency tool written by Peter Pakos. It counts a number of different IPA entry types per master and alerts if the counts differ. It is a pretty useful tool to look for replication conflicts and to generally see if replication is working as expected.

The IPA team is working on an additional tool, freeipa-healthcheck, to check on the types of configuration problems we’ve seen come up again and again.

You might ask why IPA itself can’t do these checks automatically¬† to prevent mis-configuration. It’s a fair question, and it goes back to the architecture where IPA is not a single daemon. The primary system authentication methods in 2007 when IPA started, and largely true today, were LDAP, Kerberos, NIS and flat files. The architecture of IPA was to leverage existing services together and wrap it in a usable user-interface, hence 389-ds as the LDAP store, MIT Kerberos as the authentication provider and a plugin service to emulate the important parts of NIS. Bolt on a CA and DNS and there are quite a number of independent moving parts.

Will these checks ever be integrated directly into IPA? Probably, but it is safer to run them outside for now while we ensure both coverage and that we haven’t been over-eager in some areas and provide false positives. We would really like to run this automatically prior to upgrades to warn a user that they have problems and that upgrading is likely to be problematic.

So how can I ensure I haven’t done something to break my installation? Let’s start with the kinds of things that commonly go wrong, and why:

  1. Certificates don’t renew. IPA uses certmonger to automatically renew certificates. This works fine as long as one IPA master is defined as the renewal master but many times we’ve done this master removed. This leaves no master to do the renewal and things go badly from there.
  2. The certmonger certificate tracking was removed or modified.
  3. Filesystems running out of space. Databases don’t like this.
  4. File permissions. IPA writes to a ton of files across the filesystem. Sometimes, in this interest of security, these are tweaked in ways that prevent IPA services from having access to files it needs.
  5. Replication. Network hickups, misconfiguration, unfortunate timing. A lot can go into creating replication issues. Peter’s tool is quite good at rooting these out.
  6. A master has no DNA range so therefore can’t create new entries as a result of removing the one master that had the ranges.

freeipa-healthcheck is intended to look at a single running master to decide if it is configured as expected, and report when it finds a possible inconsistency.  The goals of the project were:

  1. Provide some basic level of assurance that the system is configured properly.
  2. Always provide an answer for every question so you know that everything has checked (the downside is a firehose of output).
  3. Machine readable output.
  4. In case of anything questionable, warn. This may generate false positives but better to ignore a non-issue than to miss a real one.
  5. Look only on one master at a time. We don’t want to require connectivity between all masters. This means that some future tool is going to need to take some of the data and do further analysis. DNA is an example of this. It is perfectly fine for a master to not have any DNA ranges configured, but the whole installation not having one is not good. freeipa-healthcheck today only reports what the current range is on a master.

How can you try it? It is shipping in Fedora 29+ and will ship in future versions of RHEL 8. It is a pretty simple command-line tool:

# ipa-healthcheck

This will execute all known checks and write JSON to stdout with an entry for each specific check and the outcome. Each check assigns a severity to outcome: SUCCESS, WARNING, ERROR or CRITICAL. For results that are not SUCCESS then additional details are provided which are hopefully enough to point a user in the right direction to address it.

For a contrived error, let’s say I messed up the certmonger tracking of the KDC cert. It would produce the following output:

[
  {
    "source": "ipahealthcheck.ipa.certs", 
    "kw": {
      "msg": "Unable to open cert file '/var/kerberos/krb5kdc/kdc2.crt': [Errno 2] No such file or directory: '/var/kerberos/krb5kdc/kdc2.crt'", 
      "certfile": "/var/kerberos/krb5kdc/kdc2.crt", 
      "key": "20191018192938", 
      "error": "[Errno 2] No such file or directory: '/var/kerberos/krb5kdc/kdc2.crt'"
    }, 
    "uuid": "9894e56a-83cc-42e6-9a77-b9066924fe73", 
    "duration": "0.296820", 
    "when": "20191021201731Z", 
    "check": "IPACertfileExpirationCheck", 
    "result": "ERROR"
  }, 
  {
    "source": "ipahealthcheck.ipa.certs", 
    "kw": {
      "msg": "Missing tracking for ca-name=None, cert-file=/var/kerberos/krb5kdc/kdc.crt, cert-postsave-command=/usr/libexec/ipa/certmonger/renew_kdc_cert, key-file=/var/kerberos/krb5kdc/kdc.key", 
      "key": "ca-name=None, cert-file=/var/kerberos/krb5kdc/kdc.crt, cert-postsave-command=/usr/libexec/ipa/certmonger/renew_kdc_cert, key-file=/var/kerberos/krb5kdc/kdc.key"
    }, 
    "uuid": "2fd7e3f4-d966-4a89-ae5f-c0071cacf33e", 
    "duration": "0.470736", 
    "when": "20191021201731Z", 
    "check": "IPACertTracking", 
    "result": "ERROR"
  }, 
  {
    "source": "ipahealthcheck.ipa.certs", 
    "kw": {
      "msg": "Unknown certmonger id 20191018192938", 
      "key": "20191018192938"
    }, 
    "uuid": "f8ab75c5-1055-4565-9973-f4e9b0a662d8", 
    "duration": "0.470772", 
    "when": "20191021201731Z", 
    "check": "IPACertTracking", 
    "result": "WARNING"
  }

]

Three separate errors for the same single issue? Yes, there is some overlap. This breaks down to:

  1. The first error is warning that the certificate file pointed to in the tracking doesn’t exist so it can’t be verified as not-expired.
  2. The second that the expected tracking for the KDC is not setup properly which could mean that renewal won’t work properly.
  3. Which means it is effectively an unknown tracking which may or may not be fine, but the tool defaults to warning when it doesn’t know for sure. This is often test certs or aborted efforts to correct tracking.

Remember that I said that every check reports in? This means that by default the output is rather voluminous. To narrow things down you can look for only output that isn’t at the SUCCESS level:

# ipa-healthcheck --failures-only

Or you can log to a file and re-parse that:

# ipa-healthcheck --output-file /var/log/ipa/healthcheck/healthcheck.log# ipa-healthcheck --input-file /var/log/ipa/healthcheck/healthcheck.log --failures-only

The hope is that this output can be incorporated into system tracking systems like Zabbix so stability over time can be easily tracked. It is probably not super useful to run healthcheck more than once a day but there is no harm in it. It only writes to stdout or the provided log file. It makes no other changes.

You can get a basic list of checks performed with:

# ipa-healthcheck –list-sources

Hopefully the names are descriptive enough to tell what it is looking for. The upsteram git repo README.md contains a fuller description of each one.

Frustrated rantings of a developer