Outage - Production APIs
Incident Report for Geoscape
Postmortem

What happened?

  • The Production API services experienced an outage from 3:11 PM to 4:32 PM AEST due to the expiry of the api.psma.com.au SSL certificate.
  • All API services were unavailable to all API customers.

3:40 pm

  • Customers begin reporting an issue with the Predictive address API. Reports suggested (and later confirmed) that issues commenced shortly after 3:00 PM.
  • Service monitoring did not indicate any errors.
  • PSMA API developers identify and report the certificate has expired.

4:15 PM

  • A new certificate was obtained and installation commenced.

4:32 PM (end of outage)

  • Testing confirms the services have returned to normal.

What did we learn?

  • We need to improve our certificate management practices.
  • The service monitoring was operating in a way that masked notifications for SSL certificate expiry.

What are we going to do (have already done)?

  • The service monitoring now checks and send notifications for approaching certificate expiry.
  • We are creating an accessible register of all certificates in use across our organisation (not just for APIs), noting their expiry dates and team ownership.
  • We are implementing recurring tasks and calendar events to support the renewal of certificates.
Posted Jul 30, 2020 - 07:37 AEST

Resolved
This incident has been resolved.
Posted Jul 27, 2020 - 17:42 AEST
Monitoring
The new certificate has been deployed and the service has been restored. We are monitoring the results.
Posted Jul 27, 2020 - 16:31 AEST
Identified
The issue has been identified as a certificate expiry on api.psma.com.au.
A certificate renewal is being implemented.
Posted Jul 27, 2020 - 16:11 AEST
This incident affected: APIs (Predictive API, Addresses API, Buildings API) and Beta APIs.