Shape CosmosDB Output with JMESPath

According to James Saryerwinie, his truly awesome query language for JSON is pronounced James Path. I trust him on the pronunciation of JMESPath. Besides, when I used to say “Jay-mezz Path,” it just sounded wrong to me.

I fell in love with JMESPath using the Azure CLI. Many of the Azure CLI commands support a –query parameter which can be used to filter and shape the output using JMESPath expressions. Adam Raffe has a good blog post on using JMESPath in the Azure CLI which you should read to learn more. But here’s a sample to illustrate how useful JMESPath is. Imagine that you want to list all the Redis Cache instances in a subscription. Maybe you start with this command:

$ az redis list

The output is large. Here’s the JSON for a single node:

[
  {
    "accessKeys": null,
    "enableNonSslPort": false,
    "hostName": "abc01.redis.cache.windows.net",
    "id": "/subscriptions/432a801f-3521-4752-9445-40610812027e/resourceGroups/ABC-WebApp/providers/Microsoft.Cache/Redis/abc01",
    "linkedServers": [],
    "location": "West US",
    "name": "abc01",
    "port": 6379,
    "provisioningState": "Succeeded",
    "redisConfiguration": {
      "maxclients": "256",
      "maxmemory-delta": "2",
      "maxmemory-reserved": "2"
    },
    "redisVersion": "3.2.7",
    "resourceGroup": "ABC-WebApp",
    "shardCount": null,
    "sku": {
      "capacity": 0,
      "family": "C",
      "name": "Basic"
    },
    "sslPort": 6380,
    "staticIp": null,
    "subnetId": null,
    "tags": {},
    "tenantSettings": null,
    "type": "Microsoft.Cache/Redis",
    "zones": null
  }
]

Imagine if you had many Redis Cache instances in your subscription. The output would be unwieldy. The chances are good that you don’t need all the information that’s returned. Perhaps you’re only interested in name of the server and the SSL port. And let’s suppose that you only want information on instances that have “abc” in the name. JMESPath to the rescue:

$ az redis list --query "[?contains(name,'abc')].{ server: hostName, port: sslPort }"

This query returns only those elements in the response that contain “abc” in the name. It then shapes the output to include only the hostName and sslPort, naming them server and port. The simplified output looks like this:

[
  {
    "port": 6380,
    "server": "abc01.redis.cache.windows.net"
  }
]

That’s both nice for reading and skinny on the wire. What if you could offer the callers of your Cosmos DB-based APIs the ability to filter and shape the output like this? If I were using the Cosmos DB Table API, OData and LINQ are available to accomplish that sort of thing. But with the SQL API, we’ll have to write a bit of code. Consider this simple Node.js function to fetch information about U.S. weather stations:

let documentClient = require('documentdb').DocumentClient;
let cosmos_uri = process.env["STATION_COSMOS_URI"];
let cosmos_key = process.env["STATION_COSMOS_READONLY_KEY"];
let databaseId = process.env["STATION_COSMOS_DATABASE_NAME"];
let collectionId = process.env["STATION_COSMOS_COLLECTION_NAME"];
let client = new documentClient(cosmos_uri, { 'masterKey': cosmos_key });
let collectionLink = "/dbs/" + databaseId + "/colls/" + collectionId + "/";

module.exports = function (context, req) {
  let filterQuery = `SELECT * FROM c WHERE c.State = "${req.params.state}"`;
  try {
    let queryIterator = client.queryDocuments(collectionLink, filterQuery);
    queryIterator.toArray(function (err, matchingDocuments) {
      if (err) {
        context.done(err);
        return;
      }
      context.done(null, { status: 200, body: matchingDocuments, headers: { 'Content-Type': 'application/json' }});
    });
  } catch (ex) {
    context.done(ex);
  }
};

The function requires that you pass the U.S. state to limit the results. Let’s fetch the weather stations in the state of Virginia:

https://weatherapi.azurewebsites.net/api/StationQuery/state/VA

This operation returns information about several dozen weather stations in Virginia. It’s a long list but here’s the first one to show you how the records are shaped:

[
    {
        "WBAN": "00120",
        "WMO": null,
        "CallSign": "JFZ",
        "ClimateDivisionCode": null,
        "ClimateDivisionStateCode": 44,
        "ClimateDivisionStationCode": null,
        "Name": "CLAYPOOL HILL",
        "State": "VA",
        "Location": "TAZEWELL COUNTY AIRPORT",
        "Latitude": 37.067,
        "Longitude": -81.8,
        "GroundHeight": 2651,
        "StationHeight": 0,
        "Barometer": 0,
        "TimeZone": -5,
        "id": "655785f7-92d5-74cb-c1db-52df225a2ee5"
    }
]

Now suppose we want to query only for the stations below 20 feet of ground elevation. Of course, we could modify the Node.js function to accept extra predicates, attaching them to the WHERE clause. With Cosmos DB, that’s a pretty safe thing to do since there are no INSERT, UPDATE, DELETE, ALTER and DROP statements in the language that could be injected to do harm. In fact, appending the U.S. state to the SELECT statement as shown above is quite safe in Cosmos DB. If a hacker tried to inject something malicious through the URL, at best they’d get back junk. More likely than not, the query would just fail since there’s no way to modify the data or the configuration of a Cosmos DB with a classic SQL injection attack.

In considering whether or not to allow API users to add new SQL predicates, we must ask ourselves:

  1. Why make our API users learn how to construct SQL predicates?
  2. What if we want to change the implementation from SQL API to something else?

Having an abstraction like OData or JMESPath that sits atop the API for filtering and projection makes a lot of sense. Using JMESPath, here’s the URL we’d like to be able to send to fetch all the weather stations in Virginia below 20 feet of ground elevation:

https://weatherapi.azurewebsites.net/api/StationQuery/state/VA?query=[?GroundHeight < `20`]

It’s surprisingly easy to add this functionality to the Node.js function. After importing JMESPath, check for the new query parameter in the iterator and execute it as a JMESPath search:

if (req.query && req.query.query) {
  try {
    matchingDocuments = jmespath.search(matchingDocuments, req.query.query);
  }
  catch (ex) {
    context.done(ex);
    return;
  }
}

Here’s the full source code of the updated function on GitHub. With that change in place, the URL that restricts the results to those weather stations below 20 feet of elevation now returns only six stations in Virginia, not dozens of them. However, we’re not quite done. Imagine that we want to trim down the JSON response to include only the location and the geographic coordinates of the matching records. Let’s try this URL and see how JMESPath handles it:

https://weatherapi.azurewebsites.net/api/StationQuery/state/VA?query=[?GroundHeight < `20`].{ description: Location, geo: { latitude: Latitude, longitude: Longitude } }

Here’s the much smaller output:

[
    {
        "description": "LANGLEY AFB AIRPORT",
        "geo": {
            "latitude": 37.0828,
            "longitude": -76.3603
        }
    },
    {
        "description": "RONALD REAGAN WASHINGTON NATL AP",
        "geo": {
            "latitude": 38.8472,
            "longitude": -77.0345
        }
    },
    {
        "description": "NORFOLK NAS",
        "geo": {
            "latitude": 36.9375,
            "longitude": -76.2893
        }
    },
    {
        "description": "NAVAL AUXILIARY LANDING FIELD",
        "geo": {
            "latitude": 36.695,
            "longitude": -76.1356
        }
    },
    {
        "description": "QUANTICO MCAF",
        "geo": {
            "latitude": 38.5036,
            "longitude": -77.305
        }
    },
    {
        "description": "FELKER ARMY AIRFIELD",
        "geo": {
            "latitude": 37.1333,
            "longitude": -76.6
        }
    }
]

Excellent. I’ll leave you with a few tips. First of all, executing an extra filtering type operation on result sets that have already been reduced with server-side query predicates can be wasteful. For this article, I deliberately chose to layer JMESPath functionality on top of a U.S. state-based weather station API because I know that the result sets from the original query will always be very small. Each U.S. state has a few dozen official weather stations, most of them positioned at medium- to large-sized airports or government buildings. Even California, which has the largest number of weather stations among all the states, has only about 100 of them. Querying for a few dozen records and filtering some of them out with JMESPath isn’t terribly wasteful in the grand scheme of things.

However, if your API is going to query against larger result sets or if the volume of activity on your filtering API is high enough, you should consider attaching the user’s predicates and/or their output-shaping projections onto the original database queries for efficiency. In parting, I’ll leave you with the Cosmos DB query that implements the filtering and projection from the example above:

SELECT VALUE { description: c.Location, geo: { latitude: c.Latitude, longitude: c.Longitude } } FROM c WHERE c.State = "VA" AND c.GroundHeight < 20

This produces an output that’s identical to our JMESPath powered API. If this query expression were passed through to Cosmos DB, we wouldn’t need JMESPath at all. But your application (or your users) would need to understand SQL syntax. And you would have some difficulty moving away from SQL-based queries in the future if you wanted to do so. Food for thought, as we say.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *