Makeswift's Outage on March 11, 2025

March 17th, 2025

Incident Report



On March 11th, 2025, the Makeswift API went down for 45 minutes. Anything that relied on the API, including the builder and all Makeswift sites, were not able to reach it. Sites that did not have caching in place, like Next.js’s ISR, were the most affected. This was a severe outage as it affected our customers' websites.



To our customers, we’re truly sorry. We know Makeswift is a crucial part of your website and needs to be dependable so you can serve your own customers. We are taking measures to prevent this type of incident from recurring.





Incident Response Timeline



  • 4:26 PM: one of our engineers noticed an increase in our error reporting.
  • 4:27 PM: we declared an incident. Our engineers swarmed the issue and began debugging.
  • 4:34 PM: our automated alerts for when the builder goes down went off.
  • 4:55 PM: we began to circle in on the offending error—our API instances (Kubernetes pods) were starting up, and immediately crashing with an error.
  • Going deeper into the error, we realized it had something to do with file processing, so we zeroed in on the parts of our system that do that—file upload and our table exports functionality.
  • In doing that, we found that when used, our table exports functionality would crash in an unexpected way. Furthermore, it did not stop attempting after a number of retries—each time causing our API instances to crash as they attempted to come back online.
  • 5:10 PM: we then immediately purged the queue of jobs for table exports. In going through the table exports that did not get delivered to customers, we have reason to believe only internal testing sites were affected. At this point service was restored.



Service was down for approximately 45 minutes





What Happened?



After an upgrade to Node 22, one of our dependencies had an unexpected bug, that, when hit in our table export feature, would cause the process to crash.



That bug was hit in a retry loop that did not stop after a number of attempts, dramatically increasing the blast radius of this bug.





Short Term Resolution



  • We fixed the bug in question by upgrading the dependency.
  • We will be adding retry limits to our table exports functionality and any queue processing that we have. This is something we have been in the process of doing (our revalidation processing does not suffer from this issue), but we had not gotten to the table exports feature.





Long Term Mitigation



  • We’re exploring processes to better keep our dependencies up to date.
  • We will increase our investment into observability and testing so that issues like these are both shorter and less likely.
  • We are looking into more ways to decouple users' sites from non-critical areas of our APIs like the table export feature. Next.js ISR is one such way we do that today, but our goal is to increase redundancy.



One last thing to mention is that we currently do not have a status page. We know this was frustrating for some of you as you noticed Makeswift degrading and did not have a place to check if it was having issues. We will be adding that very soon.



Once again, we apologize for the impact this had on your editing capabilities, as well as your live website. If you have any questions or concerns, please reach out to us at support@makeswift.com.