Extract JWT Claims in Azure API Management Policy

JSON Web Tokens (JWT) are easy to validate in Azure API Management (APIM) using policy statements. This makes integration with Azure Active Directory and other OpenID providers nearly foolproof. For example, one might add the following directive to the <inbound> policy for an API to ensure that the caller has attached a bearer token with acceptable audience, issuer and application ID values in the signed JWT:

That’s nice. A little bit of markup and all that nasty security plumbing is handled outside the API. But what if we want to pass some individual claims named inside the token on to the API backend? Unfortunately, Azure APIM doesn’t have that built into JWT token validation policy. Ideally, we’d be able to extract claims during validation into variables and pass them in HTTP headers before the request is forwarded to the backing API. Until that feature is added, here’s how you can do that:

In this code, I’ve added some script inside the <set-header> policy statement to fetch the Authorization header from the request, check that it’s a Bearer type token, attempt to parse it (which checks the token’s signature), then finally extracts the value of one specific claim. Most of that work already happens inside <validate-jwt> policy, as you can imagine. Until there’s an easier way to extract JWT claims individually, the solution shown here works nicely. Enjoy.

If you agree with me that this feature should be built right into the <validate-jwt> policy, please upvote the feature request I wrote on the APIM feedback site.

.NET Back to Basics – Delegates to Expression Trees

I led a talk at the Richmond, Virginia .NET User Group on 2016/2/4 about how delegates have evolved in .NET since 2002. We had about 40 in attendance from my rough count and the discussion was energetic. Thanks to everyone who attended. Below, you’ll find the links for the presentation and source code from the meeting. The slides are light on content but they’ll help connect you to the ten different examples in the attached source code, at least.

Get the Source Code

Get the Slides

If you are a user group leader and would like me to deliver this presentation to your group, contact me on Twitter as KevinHazzard. Enjoy!

Schema Refactoring with Views

Database triggers are generally pretty awful. Anyone who has had to deal with a database heavily laden with DML triggers knows this and avoids using them. If anyone tells me that they must use a trigger, I can always prove that there’s a better way that doesn’t bury dependencies and business logic so deeply in the database. However, there’s one special case where I find triggers helpful.

Refactoring a schema to support new work is seemingly impossible when you must also serve legacy applications that cannot be easily changed. Views provide an interesting  abstraction that can give you the flexibility you need in this case. Acting like façades over the table structures evolving beneath them, views may use instead-of triggers to translate the data back and forth between the old and new formats, making it seem to the legacy applications that the database hasn’t changed. Let me start with a simple  Address table to demonstrate the idea.

Now suppose that we need to make two changes to this table without disrupting legacy applications that depend on the Address table remaining defined as it has been for some time. First of all, to separate the subject areas of the database better, we’ve decided that we must move the Address table from the [dbo] schema into a schema named [geo]. Next, the individual [Latitude] and [Longitude] attributes must be replaced with a single column of type GEOGRAPHY to support some enhanced geographic processing functions being added for new applications. The new Address table will be created like this:

Moving the data into the new table is straightforward using the GEOGRAPHY::Point function:

After fixing up any foreign key references from the old Address table to the new one, we’re ready to drop the old table and create a new view in its place:

The new view is named exactly like the dropped Address table so SELECT queries work just as they did before:

Now let’s try to insert some data into the view:

The insert fails with the message:

Update or insert of view or function ‘dbo.Address’ failed because it contains a derived or constant field.

The problem is that the view is based on some logic for parsing out latitude and longitude parts from a GEOGRAPHY type. The view doesn’t know how to convert the individual latitude and longitude components back into a GEOGRAPHY so let’s provide an instead-of trigger to do that:

With the INSTEAD OF INSERT (IOI) trigger in place, the insert statement tried before now works. We should add INSTEAD OF UPDATE (IOU) and INSTEAD OF DELETE (IOD) triggers to the view to make sure those operations continue to work for legacy applications, too:

With those triggers in place, the following statements work as we had hoped:

In closing, I’ll admit that this pattern has some potential problems that you may need to address. If you’re using an Object-Relational Mapping (ORM) tool that dynamically inspects and validates metadata in the database, it may get confused by the use of a view where it once found a real table. Also, what the Microsoft Entity Framework (EF) refers to as Navigation Properties, representing the foreign key relationships between tables, may break using this pattern. Also, the UPDATE trigger does not allow any way to update the primary key value as currently implemented. That’s certainly possible using the [deleted] row set provided to the trigger. However, since modifying surrogate, primary keys isn’t commonly expected or allowed, I didn’t provide that more complex implementation. Lastly, you’ll find that as your evolving database design drifts further and further from what the legacy applications use, the harder it will be to maintain the views and their triggers. My Schema Refactoring Pattern, as I call it, is best to employ when you have a firm date in hand when you know the old schema can be deprecated. Triggers are still evil so you should have a solid plan when you begin the refactoring process to stop using them as soon as possible.

Generating Filenames Dynamically in SSIS

A file’s name and location are often used to express what’s inside it. Filenames are not required to be meaningful to human beings but they often follow some sort of pattern for categorizing and describing the data inside them. In this way, we can think of the name of a file as being somewhat like metadata. In this article, I’ll focus on a simple example that follows this idea: generating a filename in SQL Server Integration Services (SSIS) that contains the time and date when the file was created. The file creation time is important metadata that other systems can use to make decisions in downstream ETL processes.

In SSIS, the Script Task is good for this sort of thing because a small bit of C# code can add just the kind of needed flexibility and reuse we require. For example, imagine that a file with Employee information will be created from the HR schema. Those bits of metadata can easily be embedded in the filename because they’re somewhat static with respect to the package. However, adding something dynamic like the current date and time to the filename requires some code. To satisfy the requirements in a reusable way, imagine a simple, string-based template that looks like this:

HR_Employees_{yyyyMMdd}T{HHmmss}Z.csv

The text between the curly braces is what we need to parse out to be replaced with the current date and time values. To find the escaped date and time sequences, a regular expression will do nicely. In Figure 1, observe the template string being evaluated in a popular, web-based regular expression tester.

Using a web-based regular expression tool to find escaped date and time sequences.

Figure 1 – Using a web-based regular expression tool to find escaped date and time sequences. Click or tap to see the full-sized image.

Regular expressions are weird but with a bit of study and tools like the one shown in Figure 1, it’s easy to experiment and learn quickly. There are many such regular expression testing tools online as well as a few that can be run natively on your computer. The simple expression tested here is \{\w+\} which looks rather cryptic if you’re unaccustomed to regular expression syntax. However, it’s really quite simple. Reading left to right, the expression means we’re looking for:

  1. A starting curly brace followed by
  2. Any word sequence followed by
  3. An ending curly brace

As you can see in the target string near the bottom of Figure 1, both of the sequences in the template have been found using this regular expression. That means the regular expression will work in the C# code. All that’s needed now is the code that will find those sequences in the template and replace them with their current date and time values.

Before we look at that however, I must drag a new Script Task onto the control flow of my SSIS package. I also need to add two variables to the package that will be used to communicate with the script. Figure 2 shows the control flow with the new Script Task on the design surface, the two new variables that were added and the opened configuration dialog for the Script Task.

Configuring a new Script Task to the SSIS package.

Figure 2 – Configuring a new Script Task to the SSIS package. Click or tap to see the full-sized image.

After dragging a Script Task object from the toolbox onto the control flow, double-clicking it shows the Script Task Editor dialog. To support the invocation of the C# code, two package-level variables called FilenameTemplate and GeneratedFilename were created. You can see them in the variables window near the bottom of Figure 2. Notice that the FilenameTemplate variable has the text with the escaped date and time parts tested earlier. In the Script Task Editor, the FilenameTemplate variable has been added to the ReadOnlyVariables collection and the GeneratedFilename variable has been added to the ReadWriteVariables. That’s important. Failing to add the variables to those collections means they won’t be visible inside the C# code and exceptions will be thrown when trying to use them.

Now we’re ready to write some script code. Clicking the Edit Script button in the Script Task Editor dialog will start a new instance of Visual Studio with the standard scaffolding to support scripting in SSIS. Find the function called Main() and start working there. The first line of code must fetch the contents of the FilenameTemplate variable that was passed in. Here is the line of C# code to do that:

string template = Dts.Variables["FilenameTemplate"].Value.ToString();

With the template in hand, we can convert and save the escaped date and time sequences with the following line of code:

Dts.Variables["GeneratedFilename"].Value = ExpandTemplateDates(template);

Of course, to make that work, we need to implement the ExpandTemplateDates() function, so the following code should be added inside the same class where Main() function is defined.

This method creates the \{\w+\} regular expression tested earlier and uses it to replace the matching sequences in the template parameter. That’s simple to do with .NET’s DateTime class which has a handy ToString() function that can accept the yyyyMMdd and HHmmss formatting strings found in the template. Figure 3 brings all the code together to help you understand.

The script code to find and replace escaped date and time formatting sequences.

Figure 3 – The script code to find and replace escaped date and time formatting sequences. Click or tap to see the full-sized image.

Before closing the C# code editor, it’s a good idea to run the Build command from the menu to make sure there are no syntax errors. To use the new dynamic filename generator, I’ll add one more variable to the package called Filepath. That will be concatenated with the GeneratedFilename to form the full path on disk where the output file from the package will be stored. The connection manager for that file needs to have its ConnectionString property modified at runtime so I’ll use the Expression Builder to do that.

Using the Expression Builder dialog to modify the target ConnectionString.

Figure 4 – Using the Expression Builder dialog to modify the target ConnectionString. Click or tap to see the full-sized image.

From the properties for the connection manager, click the ellipsis (…) button next to the Expressions property and add an expression for the ConnectionString as shown in Figure 4. Once that expression is saved, the full path and name of the file to be saved will be assembled from the Filepath and the GeneratedFilename variables at runtime.

Bringing it all together, Figure 5 shows the results of running the package with a source table, the target flat file bearing the new ConnectionString expression and a Data Flow Task that moves some data from the source to the target. The data flow itself isn’t relevant so it isn’t shown here. What’s important to demonstrate is that the C# code correctly fetched the template variable, processed the regular expression, matched the sequences, replaced them with the date and time values and saved the new filename. The connection manager’s ConnectionString expression also correctly applied the newly generated filename to the path when saving the file to disk.

A test run showing the dynamic filename that was generated and the file on disk with that name.

Figure 5 – A test run showing the dynamic filename that was generated and the file on disk with that name. Click or tap to see the full-sized image.

I marked up the screen shot with a red arrow pointing to the package log output showing the filename that was generated by the C# code when it ran. The blue arrow points to the actual target file on disk, showing that the two match.

There are other ways to do what’s been demonstrated here. However, I find this solution to be both simple and extensible. The example shown here can be easily modified to include many types of dynamic metadata other than dates and times. Moreover, this is a highly reusable pattern given that you need only copy the Script Task into a new SSIS package and set up a couple of package variables to use it anywhere you like. In the next article in this series, I’ll focus on consuming files with dynamically assigned filenames.

On Primary Key Names

If you use frameworks like Microsoft Azure Mobile Services or Ruby on Rails, then you’re accustomed to complying with a host of development conventions. Frameworks are often said to be opinionated, forcing certain design decisions on the developers who use them. Given that very few software design choices are perfect in every situation, the value of having an opinion is often more about consistency than it is about correctness.

“A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do.” — Ralph Waldo Emerson, Essay: First Series on Self-Reliance

Emerson’s famous quote about the potential perils of standardization is sometimes misapplied. For example, I once attended a seminar by a software vendor where the speaker referred to Ruby on Rails developers as hobgoblins because of their unswerving reliance on programming conventions. Yet, those who understand the Ruby on Rails framework, understand that it is loaded with touchstones that lead us to exhibit good and helpful behaviors most of the time. The seminar speaker’s broad derision of the Rails framework was either based on his misunderstanding or an intent to misdirect the hapless audience for some commercial gain. In his famous essay, Emerson clearly relegates only those friendly yet troublesome creatures of habit that lead to folly as the ones to be categorically avoided.

“One man’s justice is another’s injustice; one man’s beauty another’s ugliness; one man’s wisdom another’s folly.” — Ralph Waldo Emerson

Yet, it is true now and again that the conventions expressed in Rails, Azure Mobile Services, Django, CakePHP and many other frameworks can lead to unfortunate consequences based on the tools’ misapplications or by circumstances  that could simply not be foreseen by the frameworks’ designers. Nowhere else is this more true than in the area of data access. A data pattern that the framework designer considers beautiful in many situations may repulse some database administrators in practice. Application frameworks are often developed and sold for so-called greenfield solutions, those not naturally constrained by prior design decisions in the database and elsewhere. However, many real-world implementations of application frameworks are of the brownfield variety, mired in the muck and the gnarled, organic growth of  the thousands of messy technical decisions that came before. In environments like that, the framework designer’s attempt to enforce one kind of wisdom may prove to be quite foolish.

Primary keys are one of the concerns that application frameworks tend to be opinionated about. Ruby on Rails, Microsoft Azure Mobile Service and Django all name their surrogate, primary keys id by default, meaning identifier or identity. In fact, the word identity is based on the Latin word id which means it or that one. So in these frameworks, when you use the primary key to identify a specific record, you’re sort of saying “I mean that one.”

Database developers and administrators often argue that relational databases aren’t so-called object databases and that naming primary keys the same for all tables leads to confusion and errors in scripting. It’s true that when you read the SQL code in a database that uses id for all the primary key names, it can be a bit confusing. Developers must typically use longer, more meaningful table aliases in their queries to make them understood. Ironically, when the application frameworks that desire uniformity in primary key names generate database queries dynamically, they often emit short table aliases or ones that have little or no relationship to the names of the tables they represent. Have you ever tried to analyze a complex query at runtime that has been written by an Object-Relational Mapping (ORM) tool? It can be positively maddening precisely because the table aliases typically bear no resemblance to the names of the tables they express.

Another problem with having all the primary keys named the same is that it sometimes inhibits other kinds of useful conventions. For example, MySQL supports a highly expressive feature in its JOIN syntax that allows you to write code like this:

SELECT * FROM Order INNER JOIN LineItem USING (OrderID);

In this case, because the Order table’s primary key is named the same as the LineItem’s foreign key to orders, the USING predicate makes it really simple to connect the two tables. One has to admit that’s a very natural-feeling expression. The aforementioned application frameworks’ fondness for naming  all primary keys id makes this sort of practice impossible.

Now that I’ve spent some time besmirching the popular application frameworks for the way they name primary keys, let me defend them a bit. As I said in the beginning, when it comes to frameworks, their opinions are oftentimes more about consistency than objective or even circumstantial correctness. For application developers working in languages like Ruby or C#, having all the primary keys named similarly gives the database a more object-oriented feel. When data models have members that are named consistently from one object to the next, it feels as though the backing database is somewhat object-oriented, with some sort of base table that has common elements in it. If such conventions are reliable, all sorts of time-saving and confusion-banishing practices can be established.

Having done a lot of data architecture work in my career and an equal amount of application development work, my opinion is that naming all database primary keys the same has more benefits than drawbacks across the ecosystem. My opinion is based on the belief that application developers tend to make more mistakes in their interpretations of data than database people do. I believe this is true because as stewards of information, database developers and administrators live and breath data as their core job function while Ruby and C# developers use data as just one of many facets that they manage in building applications. Of course, this is the sort of argument where everyone is correct and no one is. So I’ll not try to claim that my opinion is authoritative. I’m interested in hearing your thoughts on the subject.

Earth Surface Distance in T-SQL

A few years ago, I was working on a project where I needed to calculate the distance between pairs of points on the Earth’s surface. With a bit of research, I found an implementation of the haversine formula written in Python on John D. Cook’s Standalone Numerical Code site. Given a pair of latitude and longitude values, John’s code produces a unit coefficient that can be multiplied by the radius of the sphere to yield the distance in whatever unit of measurement you might want.

Having the Python code was great but I needed a version of the algorithm in F#. After searching for some time and not finding an F# implementation, I decided to write one based on John’s Python version and put it back into the public domain. John published my F# code into his Standalone Numerical Code library here for everyone to use freely.

The exercise for today is to write the haversine formula in Transact-SQL. I’ll start by proposing a test. Between two points that are well-known and using Google Earth as the standard, I define the following variables.

Google Earth estimates that these two pairs of coordinates between Richmond, Virginia USA and São Paulo, Brazil are roughly 4,678 miles or 7,528 kilometers apart. Having no better standard handy, those will have to suffice for some testing later on. Next comes the heart of the haversine formula.

I recommend saving this as a user-defined function in your SQL repository called something like ArcUnitDistance. The reason for that naming is because the value that this calculation produces is the distance between the two points supplied on a surface that has a radius of one in whatever unit of measure you’re going to apply. Now it’s time to put the code to the test.

Commonly used values for the radius of the Earth when using the haversine formula are 3,960 miles or 6,373 kilometers. Of course, the Earth isn’t a sphere. It’s a spheroid so you may find other radius values that you trust more. To find the distance between my test points in those units of measurement, I simply need to multiply the @arcUnitDistance by the radius values. Then, it’s easy to calculate the skew from the expectation and print it out. The test yields these results:

Units Expected Calculated Skew
Kilometers 7527.806 7521.74343697558 0.00080535590641083
Miles 4677.562 4673.79632989539 0.000805049746985879

It seems that for the two test points at least, the skew based on Google Earth as the standard is about 8/100ths of one percent. Over the years, I’ve successfully adapted this haversine code to C#, JavaScript and Java, too. The haversine formula performs better for short distances than using a simple law of cosines-based formula for all sorts of reasons that scientists understand. However, it’s got a margin of error of up to 0.3% in some cases which can be too great for applications that require high precision.

If you want much greater accuracy when calculating distances on spheroids like planet Earth, you should check out the Vincenty Formula instead. It’s marginally more complex than haversine but yields typically better results. I hope you find this version of the haversine formula written in Transact-SQL useful for a variety of distance measurement applications.