This is a common and complex question.
IPA is not a single daemon. It is a collection of services configured to work together. It provides a great deal of customization, and therefore a lot of room for mis-configuration post-installation. Some changes can be made that won’t be generate immediately visible problems which can make diagnosis difficult and it is particularly difficult when a new administrator takes over a running system and lacks the background of the installation.
IPA puts a great deal of effort prior to installation to verify that a system is properly configured and ready to accept an IPA master installation. What it is missing is a way post-installation to ensure that the system, or a set of masters, is still running as expected.
One method to check this is the ipa-consistency tool written by Peter Pakos. It counts a number of different IPA entry types per master and alerts if the counts differ. It is a pretty useful tool to look for replication conflicts and to generally see if replication is working as expected.
The IPA team is working on an additional tool, freeipa-healthcheck, to check on the types of configuration problems we’ve seen come up again and again.
You might ask why IPA itself can’t do these checks automatically to prevent mis-configuration. It’s a fair question, and it goes back to the architecture where IPA is not a single daemon. The primary system authentication methods in 2007 when IPA started, and largely true today, were LDAP, Kerberos, NIS and flat files. The architecture of IPA was to leverage existing services together and wrap it in a usable user-interface, hence 389-ds as the LDAP store, MIT Kerberos as the authentication provider and a plugin service to emulate the important parts of NIS. Bolt on a CA and DNS and there are quite a number of independent moving parts.
Will these checks ever be integrated directly into IPA? Probably, but it is safer to run them outside for now while we ensure both coverage and that we haven’t been over-eager in some areas and provide false positives. We would really like to run this automatically prior to upgrades to warn a user that they have problems and that upgrading is likely to be problematic.
So how can I ensure I haven’t done something to break my installation? Let’s start with the kinds of things that commonly go wrong, and why:
- Certificates don’t renew. IPA uses certmonger to automatically renew certificates. This works fine as long as one IPA master is defined as the renewal master but many times we’ve done this master removed. This leaves no master to do the renewal and things go badly from there.
- The certmonger certificate tracking was removed or modified.
- Filesystems running out of space. Databases don’t like this.
- File permissions. IPA writes to a ton of files across the filesystem. Sometimes, in this interest of security, these are tweaked in ways that prevent IPA services from having access to files it needs.
- Replication. Network hickups, misconfiguration, unfortunate timing. A lot can go into creating replication issues. Peter’s tool is quite good at rooting these out.
- A master has no DNA range so therefore can’t create new entries as a result of removing the one master that had the ranges.
freeipa-healthcheck is intended to look at a single running master to decide if it is configured as expected, and report when it finds a possible inconsistency. The goals of the project were:
- Provide some basic level of assurance that the system is configured properly.
- Always provide an answer for every question so you know that everything has checked (the downside is a firehose of output).
- Machine readable output.
- In case of anything questionable, warn. This may generate false positives but better to ignore a non-issue than to miss a real one.
- Look only on one master at a time. We don’t want to require connectivity between all masters. This means that some future tool is going to need to take some of the data and do further analysis. DNA is an example of this. It is perfectly fine for a master to not have any DNA ranges configured, but the whole installation not having one is not good. freeipa-healthcheck today only reports what the current range is on a master.
How can you try it? It is shipping in Fedora 29+ and will ship in future versions of RHEL 8. It is a pretty simple command-line tool:
This will execute all known checks and write JSON to stdout with an entry for each specific check and the outcome. Each check assigns a severity to outcome: SUCCESS, WARNING, ERROR or CRITICAL. For results that are not SUCCESS then additional details are provided which are hopefully enough to point a user in the right direction to address it.
For a contrived error, let’s say I messed up the certmonger tracking of the KDC cert. It would produce the following output:
"msg": "Unable to open cert file '/var/kerberos/krb5kdc/kdc2.crt': [Errno 2] No such file or directory: '/var/kerberos/krb5kdc/kdc2.crt'",
"error": "[Errno 2] No such file or directory: '/var/kerberos/krb5kdc/kdc2.crt'"
"msg": "Missing tracking for ca-name=None, cert-file=/var/kerberos/krb5kdc/kdc.crt, cert-postsave-command=/usr/libexec/ipa/certmonger/renew_kdc_cert, key-file=/var/kerberos/krb5kdc/kdc.key",
"key": "ca-name=None, cert-file=/var/kerberos/krb5kdc/kdc.crt, cert-postsave-command=/usr/libexec/ipa/certmonger/renew_kdc_cert, key-file=/var/kerberos/krb5kdc/kdc.key"
"msg": "Unknown certmonger id 20191018192938",
Three separate errors for the same single issue? Yes, there is some overlap. This breaks down to:
- The first error is warning that the certificate file pointed to in the tracking doesn’t exist so it can’t be verified as not-expired.
- The second that the expected tracking for the KDC is not setup properly which could mean that renewal won’t work properly.
- Which means it is effectively an unknown tracking which may or may not be fine, but the tool defaults to warning when it doesn’t know for sure. This is often test certs or aborted efforts to correct tracking.
Remember that I said that every check reports in? This means that by default the output is rather voluminous. To narrow things down you can look for only output that isn’t at the SUCCESS level:
# ipa-healthcheck --failures-only
Or you can log to a file and re-parse that:
# ipa-healthcheck --output-file /var/log/ipa/healthcheck/healthcheck.log# ipa-healthcheck --input-file /var/log/ipa/healthcheck/healthcheck.log --failures-only
The hope is that this output can be incorporated into system tracking systems like Zabbix so stability over time can be easily tracked. It is probably not super useful to run healthcheck more than once a day but there is no harm in it. It only writes to stdout or the provided log file. It makes no other changes.
You can get a basic list of checks performed with:
# ipa-healthcheck –list-sources
Hopefully the names are descriptive enough to tell what it is looking for. The upsteram git repo README.md contains a fuller description of each one.