Our service provider, AWS, experienced a failure of a data store that affected all customers of that service. Unfortunately, PSMA was an affected customer.
More information on AWS outages can be obtained from the AWS Service Health Dashboard.
10:07am - PSMA received advice from AWS of a failure in a subsystem affecting services that use VPC (Virtual Private Cloud) in all availability zones in the ap-southeast-2 (Sydney) region. PSMA services were unaffected at this time.
12:34pm - PSMA monitoring detected intermittent failures and some increased latency on the Addresses API, Predictive API and Beta APIs. Occasional API call time outs or failures were being experienced.
3:49pm - AWS had disabled writes to the datastore to allow restoration to a previously good state. This had the impact of increasing the rate of failed API calls to PSMA services.
4:20pm - Almost all API calls were now failing on the Addresses API, Predictive API, Buildings API and Beta APIs.
4:40pm - AWS had successfully restored the datastore and re-enabled writes, leading to an increase in successful API calls on PSMA services.
4:55pm - All PSMA services fully recovered
5:55pm - AWS advised that the restore was successful and all services were operating across the region again.
PSMA were unable to take any action as the issue was entirely within the domain of AWS. PSMA continued to monitor services and the activities of AWS throughout the period.
While the chance of AWS failure across an entire region is rare, it is a genuine risk.
PSMA will investigate options to allow the continuation of services in the event of AWS-wide incidents in the future.
We would appreciate feedback on the expectations and any concerns you may have in regards to this event.