Licensed by CC0 from https://www.pexels.com/photo/grey-metal-hammer-34520/

HTTP 404 for Missing API Resources

Should an API return HTTP 404 status when the specified resource cannot be found? Of course, that’s exactly what (Not Found) means. As RFC2616 states pretty clearly, 404 Not Found means:

The server has not found anything matching the Request-URI.

However, if you think APIs are like web pages, you might be perplexed by such an interpretation. Maybe 404 feels like an error because it connotes that something went wrong with the web app. Oh no, the page wasn’t found! The world is coming to an end! Run for the hills! Save the women and children!

But APIs aren’t web apps, are they? They aren’t pages at all. APIs are cacheable, uniformly-addressable, resource-oriented interfaces. Given the address for a resource, if it’s not available, the API should return the HTTP 404 status in most cases.

Unfortunately, there’s so much momentum around the idea of 404 being an error condition that it’s really hard to get developers to think differently about how the web is supposed to work. Monitoring tools, log indexers and server health checks don’t help much for changing minds. So many of these tools dutifully treat 404 as an error condition that well-meaning API developers can’t see outside of the little boxes into which they’ve been coaxed.

One of the most troubling manifestations of this page-oriented bias is Microsoft Azure’s App Service server health check. Deploying a pure Web API to an Azure App Service, you’ll find that you’re subject to all sorts of page-specific constructs related to health. Most notably, if you return HTTP 404 from a server too often, the health checks will mark the instance as failing. Obviously, if pages are missing, something’s clearly wrong with that one naughty instance out of ten. The sheer narrow-mindedness of that idea just boggles my mind given that all the servers run the same code and connect to the same resources by definition.

If this incorrect behavior meant that Azure simply recycled instances more often than it needed to, that wouldn’t be too high a price to pay for their pedantic interpretation of the HTTP specification. Unfortunately, we don’t stop paying the price there. The standard load balancer for Azure App Service seems to be completely perplexed when specific server instances are marked as unhealthy. Performance degrades quickly as more and more instances are quarantined and taken out of rotation while new ones have to be created and spun up. What a mess!

If you’re using a managed API gateway like Azure API Management (APIM), there’s a fairly elegant way to deal with this particular problem. Have your API return 200-series statuses for these conditions, keeping the health checks ignorant and happy, then translate the outbound HTTP statuses to the ones you really want to convey at the edge. Here’s some outbound policy you can add to your API to pick up a special response header named RealHttpStatus and return that instead.

The policy begins by fetching the special response header called RealHttpStatus if it exists. The API should inject this header whenever it means to return something that might be misinterpreted or mishandled by the application server’s management tools. Next, the policy removes the special header so clients won’t see how our chicanery was perpetrated. Lastly, if the integer value of the special header is 404, the actual status returned by APIM will be 404, regardless of what the back end actually provided via the actual HTTP status code.

Of course, this policy’s <choose> element can be extended to include as many other interesting HTTP status types as you might require. Your API can go on respecting HTTP for all its beauty and prescience while keeping those fossilized ne’er-do-wells completely in the dark about your villainous plans.