German region within Google Cloud unreachable for 12 hours

Google Cloud

Google Cloud has apologized for a nearly 13-hour outage in its europe-west3 region, located in Frankfurt, Germany.

The outage within Google Cloud opeurope-west3 region began on Thursday, Oct. 24 at 02:30 a.m. and was resolved at 3:09 p.m. The cause of the outage was a power failure and a problem with cooling. This caused some of the data centers in this zone to fail, resulting in disrupted services.

Google Cloud reports that engineers have implemented a solution to restore the data center to full operation.

Several services were affected by the outage: Cloud Build, Cloud Developer Tools, Cloud Machine Learning, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Pub/Sub, Google Compute Engine, Google Kubernetes Engine, Persistent Disk and Vertex AI Batch Prediction.

Users reported problems with creating virtual machines (VMs) and the inaccessibility of certain disk storage, among other issues. In the Google Kubernetes Engine, nodes in the affected zone were inaccessible, and creating new nodes sometimes failed. In Cloud Dataflow, scaling batch workers slowed down, while some streaming tasks could not run properly.

read also

German region within Google Cloud unreachable for 12 hours

User notifications and responses

Although most problems were limited to one zone, there was also limited impact at the regional level. Google emphasizes that less than one percent of operations that touched resources in the other zones experienced internal errors.

Google Cloud notified users 26 minutes after the outage began, but did not issue a solution until three hours later. Users were advised to migrate workloads to other zones or regions and to take regular snapshots of degraded regional disks.

The search giant says it has learned lessons from these incidents to improve reliability in the future.