API is degraded
Resolved
Aug 13, 2025 at 11:37pm UTC
The API has recovered.
We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.
After applying the fix and observing, the issue is resolved.
Affected services
Updated
Aug 13, 2025 at 11:30pm UTC
The issue has regressed. We are continuing to work towards a solution.
Affected services
Updated
Aug 13, 2025 at 11:26pm UTC
The API has recovered.
The concurrency limit queue has been used more heavily than previously expected, causing race conditions to occur at rates higher than expected. The anti-race mechanism was triggered in a de facto infinite loop, causing workers to stop handling new scrape jobs. The workers were restarted and the code causing the event loop exhaustion was quickly patched.
Affected services
Created
Aug 13, 2025 at 11:10pm UTC
We are investigating.
Affected services