Limited access to labs

Incident Report for Strigo

Postmortem

Today (the 4th of May (may the 4th be with you), at UTC 13:15), some users experienced downtime in our labs provisioning that lasted 3h and 5m.

We noticed that the service responsible for preparing lab connectivity for newly created labs could not run for several lab instances.

After a rather deep investigation, we conclude that this was related to an unsuccessful deploy of the service around the start time of the issue. We’ve been monitoring the system since then and all seems to be operational.

We apologize for the inconvenience.

Posted May 04, 2023 - 16:32 IDT

Resolved

The issue is now resolved

Posted May 04, 2023 - 16:25 IDT

Monitoring

The fix seems to solve the issue. Deployed to affect all labs. Monitoring.

Posted May 04, 2023 - 16:21 IDT

Identified

The root cause was identified and a patch was deployed. We're validating that the issues is fixed before unrolling the fix to all labs.

Posted May 04, 2023 - 16:18 IDT

Investigating

New labs that were created between UTC 10:50 and 13:15 had limited accessibility and lags

Posted May 04, 2023 - 16:17 IDT

This incident affected: Strigo service.