Heavy load

Update

April 08, 2020 at 10:31 AM

Update

April 08, 2020 at 10:31 AM

At a bout 8:54 CEST this morning the response times of our caching service suddenly grew to a level where cache requests started to timeout, falling back to requesting the main database. This in turn put a heavy load on the database, which caused longer response times for database queries in general. Combined this means every request to the app server had to: 1. wait for cache service timeout 2. request the cached content from database instead 3. wait for longer db response times Eventually, this started leading to requests queueing up, and some of the requests started timing out \(requests time out after 30 seconds of sitting in the queue\) As we were working to find solutions, eliminating bottlenecks and optimise for database performance on long running requests, the caching service started responding again at around 9:40 CEST, and everything returned to normal shortly after. When key components like our caching layer or the database becomes unresponsive, it’s hard to work around it, as those components are both hard to replace and hard to scale. However, there’s always takeaways from incidents, and this one was no exception: * We gathered a lot of useful data identifying bottlenecks at certain endpoints of the API and other pages in the app, that we can use for optimising performance in general, going forward. * We learned how important the caching layer is, and regarding that as a single point of failure will help us plan and scale going forward. We apologise for the inconvenience to all who were affected, we know how important Myphoner is to our clients, and we work hard to ensure stability of the platform. This incident included, our uptime over the past 30 days is 99.958%

Resolved

April 08, 2020 at 8:05 AM

Resolved

April 08, 2020 at 8:05 AM

This incident has been resolved.

Monitoring

April 08, 2020 at 7:45 AM

Monitoring

April 08, 2020 at 7:45 AM

Caching service response times are returning to normal. We are continuing to monitor the situation closely.

Identified

April 08, 2020 at 7:32 AM

Identified

April 08, 2020 at 7:32 AM

We're having issues with our caching service, which cause heavy loads on our databases. We are working to ease the load one step at a time.

Investigating

April 08, 2020 at 7:12 AM

Investigating

April 08, 2020 at 7:12 AM

We are currently investigating this issue.

Myphoner - Heavy load – Incident details

All systems operational