SSL/TLS Certificate Management: Automate Renewals and Prevent Outages
On February 3, 2020, Microsoft Teams went down for approximately three hours, affecting over 400 million users worldwide. The root cause was an expired SSL/TLS certificate. A certificate that a human was supposed to manually renew had been overlooked, and the result was one of the most visible outages in collaboration software history. Microsoft is not alone - Spotify, Equifax, LinkedIn, Ericsson, and dozens of other major companies have suffered production outages caused by expired certificates in the past five years.
Certificate expiration outages are entirely preventable. They happen because organizations treat certificate management as a manual process in an era when the average enterprise manages 50,000 to 100,000 certificates across servers, load balancers, CDNs, APIs, IoT devices, and internal services. This guide covers the complete SSL/TLS certificate lifecycle and the automation tools that prevent expiration from becoming an incident.
Understanding the Certificate Lifecycle
Every SSL/TLS certificate moves through five phases: request, validation, issuance, deployment, and renewal. Failures at any phase cause outages, but the renewal phase accounts for approximately 80% of certificate-related incidents because it requires action at a specific future date - and humans are unreliable at remembering future tasks, especially when the interval is 90 days to a year.
Certificate Types and Validity Periods
- Domain Validated (DV) certificates verify that you control the domain. Issuance takes seconds to minutes. Let's Encrypt certificates are DV with a 90-day validity period. Commercial DV certificates are typically valid for 1 year.
- Organization Validated (OV) certificates verify both domain control and the legal existence of the organization. Issuance takes 1-3 business days due to manual verification. Validity period is 1 year.
- Extended Validation (EV) certificates require the most thorough verification including physical address confirmation and authorized signatory verification. Issuance takes 1-5 business days. EV certificates no longer display the green address bar in modern browsers, significantly reducing their visual differentiation from DV certificates.
- Wildcard certificates cover all subdomains of a single level (*.example.com covers app.example.com, api.example.com, etc. but not sub.app.example.com). Available in DV and OV types. Wildcard certificates are convenient but create a single point of failure - if the private key is compromised, all subdomains are affected.
ACME and Let's Encrypt: The Foundation of Automated Certificate Management
The ACME (Automatic Certificate Management Environment) protocol, standardized as RFC 8555, automates the entire certificate lifecycle - domain validation, issuance, installation, and renewal - without human intervention. Let's Encrypt, the largest Certificate Authority by volume (issuing over 400 million active certificates), was built around ACME from the start.
How ACME Works
- Account registration. The ACME client generates a key pair and registers with the CA. This happens once.
- Order creation. The client requests a certificate for one or more domain names.
- Domain validation. The CA issues challenges to prove domain control. HTTP-01 requires placing a specific file at /.well-known/acme-challenge/ on the web server. DNS-01 requires creating a specific TXT record in DNS. TLS-ALPN-01 proves control by responding to a TLS handshake with a special certificate.
- Challenge fulfillment. The client automatically fulfills the challenge (creates the file, sets the DNS record, or configures the TLS response).
- Certificate issuance. After validation, the CA issues the certificate and the client downloads it.
- Installation. The client installs the certificate in the web server configuration and reloads the server.
- Renewal. The client runs on a schedule (typically twice daily via cron or systemd timer) and renews any certificate within 30 days of expiration.
ACME Clients
- Certbot is the reference ACME client maintained by the Electronic Frontier Foundation. It supports Apache, Nginx, and standalone modes. Installation is a single package manager command on most Linux distributions. Certbot handles the complete lifecycle including automatic Nginx/Apache configuration and cron-based renewal.
- acme.sh is a pure shell script ACME client with no dependencies beyond curl and openssl. It supports over 150 DNS providers for DNS-01 challenges, making it ideal for wildcard certificates. It installs to the user home directory without root access.
- Caddy is a web server with ACME built into the core. If you serve your site through Caddy, certificate management is zero-configuration - Caddy automatically obtains and renews certificates for any domain in its configuration. This is the simplest path to automated HTTPS if you are willing to use Caddy as your reverse proxy or web server.
- cert-manager is the standard certificate management tool for Kubernetes. It runs as a controller in the cluster and automatically provisions certificates for Ingress resources, Istio gateways, and custom resources. It supports Let's Encrypt, Venafi, Vault, and other issuers.
Certificate Monitoring: Catching What Automation Misses
Automation handles the common case, but edge cases still cause outages. The ACME renewal cron job might fail silently because of a DNS provider API change. A manually managed certificate on a legacy load balancer might be outside your automation scope. A third-party CDN might serve a cached certificate that your tooling does not track. Monitoring is the safety net.
What to Monitor
- Certificate expiration date. Alert at 30 days, 14 days, 7 days, and 3 days before expiration. The 30-day alert gives time for process-based renewal. The 3-day alert is the emergency escalation.
- Certificate chain validity. A valid leaf certificate with an expired intermediate certificate causes the same browser error as an expired leaf certificate. Monitor the entire chain, not just the end-entity certificate.
- TLS configuration. Monitor for deprecated protocol versions (TLS 1.0, 1.1), weak cipher suites, and misconfigured HSTS headers. SSL Labs provides an API for automated grading.
- Certificate transparency logs. Monitor CT logs for certificates issued for your domains. Unauthorized certificates indicate a CA compromise or domain hijacking attempt. Facebook's CT monitoring tool and Certspotter from SSLMate both provide free monitoring.
Monitoring Tools
- Uptime monitoring services (UptimeRobot, Pingdom, Better Uptime). Most uptime monitoring services include SSL certificate expiration checks. UptimeRobot's free tier monitors 50 endpoints including SSL expiration alerts. This is the easiest way to get basic certificate monitoring running in under 5 minutes.
- Keychest. A dedicated certificate monitoring platform that discovers all certificates across your domains (including subdomains you may have forgotten), tracks expiration, and alerts on configuration issues. The free tier covers 2 servers.
- Nagios/Zabbix/Prometheus. For organizations running their own monitoring infrastructure, all three support SSL certificate checks. Prometheus with the blackbox_exporter probes endpoints and exports certificate expiration as a metric. Alertmanager fires alerts based on expiration thresholds.
- Custom scripts. A simple shell script using openssl s_client can check certificate expiration for any endpoint and integrate with your existing alerting via email, Slack webhook, or PagerDuty. Running this daily via cron for all your endpoints provides basic monitoring with zero cost.
Common Certificate Failures and How to Prevent Them
1. Forgotten Manual Renewals
The most common failure. A certificate was purchased from a commercial CA a year ago, the person who set it up left the company, and nobody else knows the certificate exists until the site goes down. Prevention: automate with ACME. For certificates that cannot be automated (EV certificates, certificates on devices without ACME support), maintain a centralized certificate inventory with expiration dates and designated owners.
2. Renewal Succeeded but Deployment Failed
The new certificate was obtained but the web server was not reloaded, the load balancer was not updated, or the CDN cached the old certificate. Prevention: test the deployed certificate (not just the file on disk) as part of the renewal process. After Certbot renews, use a deploy hook to reload Nginx and then verify the live endpoint serves the new certificate.
3. Intermediate Certificate Missing
The server sends the leaf certificate but not the intermediate certificates needed to chain to the root CA. Modern browsers may still connect (they can fetch intermediates), but API clients, mobile apps, and older systems fail with "unable to verify" errors. Prevention: always configure the full chain. Certbot provides fullchain.pem for exactly this purpose. Test with openssl s_client -showcerts to verify the complete chain is served.
4. CAA Record Blocks Renewal
DNS CAA (Certificate Authority Authorization) records specify which CAs are permitted to issue certificates for your domain. If a CAA record is misconfigured or changed after initial certificate setup, renewal from the same CA may fail. Prevention: include CAA records for your automated CA (0 issue "letsencrypt.org" for Let's Encrypt). Check CAA records as part of any DNS change process.
5. Rate Limits and Quota Exhaustion
Let's Encrypt enforces rate limits: 50 certificates per registered domain per week, 5 duplicate certificates per week, 300 new orders per account per 3 hours. High-traffic deployments with many subdomains or environments can hit these limits. Prevention: use wildcard certificates where possible (one *.example.com instead of 50 individual subdomain certificates). Use the Let's Encrypt staging environment for testing.
Enterprise Certificate Management Platforms
Organizations managing hundreds or thousands of certificates across multiple CAs, cloud providers, and on-premises infrastructure need centralized visibility and control that individual ACME clients cannot provide.
- Venafi TLS Protect. The enterprise standard for certificate lifecycle management. Discovers certificates across your entire infrastructure, enforces policy (approved CAs, key lengths, algorithms), automates renewal across multiple CAs, and provides a single pane of glass for certificate inventory. Integrates with F5, AWS, Azure, Kubernetes, and most load balancers and web servers.
- DigiCert CertCentral. Combines a Certificate Authority with a management platform. If you purchase certificates from DigiCert, CertCentral provides automated issuance, renewal, and deployment. The discovery tool finds certificates from any CA across your infrastructure. Most useful for organizations standardized on DigiCert as their primary CA.
- HashiCorp Vault PKI. An internal Certificate Authority for organizations that need to issue their own certificates for internal services, mTLS between microservices, and private infrastructure. Vault handles the full PKI lifecycle - root CA, intermediate CAs, certificate issuance, revocation, and rotation. Integrates natively with Kubernetes, Terraform, and Consul.
- AWS Certificate Manager / Google Certificate Manager / Azure Key Vault. Cloud-native certificate management for resources within the respective cloud. ACM is free for certificates used with AWS services (ALB, CloudFront, API Gateway). These are the right choice for certificates that only need to exist within a single cloud provider.
Never Miss a Certificate Expiration Again
HelpBot monitors all your SSL/TLS certificates, alerts before expiration, and creates tickets automatically when renewal action is needed. Set up in 10 minutes.
Start Free Trial