Cosmos DB Server-Side Primer – Episode 1

Stored procedures in Cosmos DB are JavaScript functions that run close to the data to ensure high performance. In addition to great performance, stored procedures provide multi-operation transaction support with automatic rollback when things don’t go as planned. In this episode, we’ll look at the server-side Collection and Response classes to learn how to create and update documents. Then we’ll practice deploying a stored procedure to Cosmos DB and executing it. Let’s start with a very simple example that demonstrates how to save a new document:

__.createDocument(
  __.getSelfLink(),
  { name: "Oppenheimer" },
  function (err, newDoc) {
    __.response.setBody(newDoc.id)
  });

This JavaScript creates a new, hard-coded document in the collection with “Oppenheimer” as the name value. The callback responds to the caller with the unique ID assigned to the document. If you’ve worked with JavaScript libraries like Lodash or Underscore.js, the double underscores in this example will feel familiar. However, if you haven’t worked libraries like those, the __ symbol might be off-putting. What is that?

Well, the __ symbol is just an alias for some important objects in the Cosmos DB server-side model. When you read other articles about authoring Cosmos DB stored procedures, you’ll sometimes see code expressed in a more verbose form. Here’s the same code shown above without using the __ alias:

var context = getContext();
var collection = context.getCollection();
collection.createDocument(
  collection.getSelfLink(),
  { name: "Oppenheimer" },
  function (err, newDoc) {
    context.getResponse().setBody(newDoc.id)
  });

Comparing the two versions side-by-side will help you understand how the __ alias works. There’s nothing right or wrong about either style. My advice is to use the one that gives you better comprehension. For me, the __ alias helps to clear away the visual clutter which improves my understanding when I’m composing server-side code in Cosmos DB so I use it all the time. In this example, the __ alias is being used for three operations:

  • Creating a new document – __.createDocument(),
  • Getting the address of the current collection – __.getSelfLink(), and
  • Sending a response back to the caller – __.response.setBody().

The intent of the code is to create a new database document and return the assigned unique identifier back to the caller. So calling __.createDocument() and __.response.setBody() to get those tasks accomplished makes sense. However, you may be wondering about that __.getSelfLink() call. What does that do? What is a self link, anyway? You can think of Cosmos DB links as unique addresses that refer to various objects in the ecosystem. Every object in Cosmos DB has a link. Databases have links. Collections have links. Documents have links. Attachments have links. Every linked object maintains a link to itself which are called a self-links, of course. As an example, here’s a simple document as it might be stored in Cosmos DB:

{
    "message": "Hello Cosmos DB",
    "timestamp": "2018-09-08T21:01:25.494Z",
    "id": "bbd8d3db-8655-a7da-286e-50586cd39289",
    "_rid": "KqgHAMaKS9QBAAAAAAAAAA==",
    "_self": "dbs/KqgHAA==/colls/KqgHAMaKS9Q=/docs/KqgHAMaKS9QBAAAAAAAAAA==/",
    "_etag": "\"0100b33c-0000-0000-0000-5b9438a60000\"",
    "_attachments": "attachments/",
    "_ts": 1536440486
}

Line 6 shows the self-link to the document, appropriately named _self. Let’s look at the shape of the document self-link:

dbs/{db_id}/colls/{collection_id}/docs/{doc_id}/

If you have a document link in hand, its self-link is the document link minus the docs portion:

dbs/{db_id}/colls/{collection_id}/

And similarly, the collection’s database link is its self-link minus the colls part:

dbs/{db_id}/

Now it’s time to make the code snippet deployable. To transform it into a valid stored procedure, first place it into a parameterized function:

function(doc) {
  __.createDocument(__.getSelfLink(),
    doc,
    function (err, newDoc) {
        __.response.setBody(newDoc.id)
    });
}

Instead of saving hard-coded documents about Oppenheimer, we can call this stored procedure to save any sort of JSON document to the collection. But how? The function is anonymous. How can we call the stored procedure from an external system if it has no name?

When you create stored procedures in Cosmos DB, the names by which they are published and invoked isn’t expressed in JavaScript. Instead, the names are bits of metadata that can be described in JSON like this:

var procedureDef = {
  id: "SaveADocument",
  body: function(doc) {
    __.createDocument(__.getSelfLink(),
      doc,
      function (err, newDoc) {
        __.response.setBody(newDoc.id)
      });
  }
}

See the source code in the body while the name of the procedure is the id property. In other languages that don’t speak JSON natively, the body (the source code) will be expressed as a string whenever you create or update a stored procedure. Now, let’s use Visual Studio Code and a small Node.js application to deploy the stored procedure and invoke it.

You might want to use Python, Java or C# to interact with Cosmos DB instead. All those languages are at home in Visual Studio Code and all of them have first-class SDKs. Check out GotCosmos.com for some great samples and tutorials for working with Cosmos DB from those languages.

With both Code and Node installed on your Windows, Mac or Linux workstation, issue the following commands in a terminal window to create a new project directory, initialize it, and start the Code editor:

mkdir UpsertCosmosSP
cd UpsertCosmosSP
touch index.js config.js
npm init
npm install @azure/cosmos
code .

If you’re running Windows which has no touch command, try these two statements to create the empty index.js and config.js files instead:

type nul >> index.js
type nul >> config.js

In Code, add these exports to the config.js file:

exports.connection = {
    endpoint: 'https://localhost:8081',
    authKey: 'C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw=='
};

exports.names = {
    database: 'DELETE_ME_SOON',
    collection: 'Playground',
};

This configuration assumes that you’re using the Cosmos DB Emulator running on port 8081 of the local machine and using the default master key. However, the emulator only runs on Windows 10 or Windows Server 2016 at the time of this writing.

If you want or need to use a real Cosmos DB account in the Azure cloud, you’ll have to replace the connection endpoint and authKey with your own. In that case, use the Cosmos DB blade in the Azure Portal to create a new account or locate an existing one where you have read-write access. From the account’s Keys page, fetch the read-write endpoint URI and the primary or secondary key, replacing the endpoint and authKey values in the config.js file.

You may also change the database name and the collection name in the configuration file if you like. If you leave them as shown, the application will automatically create a new database called DELETE_ME_SOON in your Cosmos DB account. In that new database, a collection named Playground will also be created as needed. That’s where the stored procedure will be deployed along with any documents you create during the test runs of the application below.

In the index.js file, add the following application code. The full source code for this example is also available on Github.

var CosmosClient = require('@azure/cosmos').CosmosClient;
var config = require('./config.js');
var client = new CosmosClient({
    endpoint: config.connection.endpoint,
    auth: { masterKey: config.connection.authKey }
});

async function upsertProcedureAndExecute(sprocDef, docToInsert) {
    const { database } = await client.databases
        .createIfNotExists({ id: config.names.database });
    const { container } = await database.containers
        .createIfNotExists({ id: config.names.collection });
    const { sproc } = await container.storedProcedures.upsert(sprocDef);
    const { body: results, headers } = await sproc.execute(docToInsert);
    if (headers && headers['x-ms-request-charge'])
        console.log(`Charge = ${headers['x-ms-request-charge']} RU`);
    if (results)
        console.log(`DocID = ${JSON.stringify(results)}`);
    // comment in the next line to delete the database
    // await database.delete();
}

var docToSave = {
    message: 'Hello Cosmos DB',
    timestamp: (new Date()).toISOString()
};

var procedureDef = {
    id: 'saveDocument',
    body: function (doc) {
        __.createDocument(__.getSelfLink(),
            doc,
            function (err, newDoc) {
                __.response.setBody(newDoc.id)
            });
    }
};

upsertProcedureAndExecute(procedureDef, docToSave)
    .catch((err) => { console.error(JSON.stringify(err)); });

Lines 1 – 6 load the configuration file and instantiate the Cosmos DB client. Lines 8  – 21 define the upsertProcedureAndExecute function which starts by creating the database named in the configuration file if it doesn’t already exist. Similarly, the collection will be created as necessary. Then the stored procedure is upserted to the collection. The term upsert in the context of deploying a stored procedure means that if the procedure doesn’t already exist by the provided id, it will be created. If it does exist, it will be overwritten with the source code you provide.

Lastly, the stored procedure is invoked with the document to be saved. The request charge (expressed in Cosmos DB Resource Units) will be fetched from a response header and logged to the console along with the unique ID assigned to the new document.

When I’m doing development in Cosmos DB, I find that it’s useful to watch the request charges. Knowing how expensive various operations are as you move along is a good way to stay connected to the performance of your database.

You can run the application from the command line by executing “node index.js” in a terminal window which should output something like this to the console:

Charge = 5.45 RU
DocID = "8c6ec318-ed0a-3c02-216b-bc5161bfc5ba"

You can also run the application in the Visual Studio Code debugger by pressing F5 or clicking Debug / Start Debugging from the menu. The Code debugger for Node.js is very rich, allowing you to set breakpoints, interactively inspect variables and the call stack, evaluate expressions and much more.

You’ve observed that if you save a document to a Cosmos DB collection without an id property, an id will be created for it, as a GUID in string form. However, while every document saved to a Cosmos DB collection must have a unique id, they aren’t required to be GUIDs or follow any other specific pattern for that matter. They only need to have unique values. Try saving a document with an id of your own by replacing the docToSave object in the sample app with something like this:

var docToSave = {
    message: 'Hello Cosmos DB',
    id: 'testing123',
    timestamp: (new Date()).toISOString()
};

The first time you save the document by invoking the SaveADocument stored procedure, it will work correctly because no document with the id ‘testing123’ already exists. However, if you run it again, you’ll get an exception saying something like:

TypeError: Unable to get property 'id' of undefined or null reference

The error message isn’t super helpful but it does indicate that there’s something wrong with the ‘id’ property. In this case, we’ve attempted to violate a uniqueness constraint within the collection during the second invocation of the  __.createDocument() method. If it’s your intent to allow callers to replace existing documents that way, simply change the stored procedure definition to call __.upsertDocument() instead of __.createDocument() like this:

var procedureDef = {
    id: 'saveDocument',
    body: function (doc) {
        __.upsertDocument(__.getSelfLink(),
            doc,
            function (err, newDoc) {
                __.response.setBody(newDoc.id)
            });
    }
};

Now you can call the stored procedure over and over with documents having the same id value without errors and watch the timestamps and other properties change each time you do so.

Before I finish this article, I need to address something I’ve been doing wrong from the beginning. As it turns out, things can go wrong when you use complex, networked computing systems. We need to account for the various error conditions that might occur. Moreover, as a multi-tenant database, Cosmos DB has a range of constraints that dictate how stored procedures must behave. Based on so-called bounded execution rules, Cosmos DB may decide that it cannot accept your request at the current time. If that happens, we should obey and convey that information back to the caller. Here’s an updated procedure definition that handles both the error and non-acceptance cases:

var procedureDef = {
    id: 'saveDocument',
    body: function (doc) {
        var accepted = __.upsertDocument(__.getSelfLink(),
            doc,
            function (err, newDoc) {
                if (err) throw err;
                __.response.setBody({ id: newDoc.id, completed: true })
            });
        if (!accepted) __.response.setBody({ completed: false });
    }
};

If you save this version of the stored procedure, you’ll see that a Boolean flag is returned in non-error cases that indicates whether or not the upsert operation was allowed to complete. Also, if errors occur, they’ll be thrown back to the server-side engine which will roll back the current transaction. Cosmos DB’s automatic rollback of transactions depends on our throwing exceptions when things don’t go as planned.

A gentle reminder: if you created a Cosmos DB collection in the cloud as part of this exercise, please be sure to delete it if you don’t want to incur any ongoing charges once you’ve completed the exercise.

In the next article in this series, we’ll round out our examination of the basic document-oriented functions in the Collection class like __.queryDocuments and __.deleteDocument. We’ll also dive into continuations, another key concept related to working nicely within Cosmos DB’s bounded execution rules.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *