Transforming Config Files and XML Documents

When you’re deploying an application to different servers representing different environments, manually updating configuration settings can get very messy. In .NET applications, one of the ways to keep configuration settings under control is by using configuration transforms.

These transforms are a set of rules that define how the App.config or Web.config is to change (e.g. add appsettings, replace connection strings, etc) for the particular environment.

Syntax

Typically, when you create a new web application in Visual Studio, you’ll get a little more than just a Web.config file. In fact, if you expand the Web.config node in Solution Explorer, you’ll find that there are another two files: Web.Debug.config and Web.Release.config.

config-transform-web-default

As shown in the screenshot above, the Web.Release.config has a similar structure to Web.config itself; however it defines rules that work on the Web.config to produce a different result. In this case, there is a transform that will remove the debug attribute from the compilation node. You can see the result of this if you right click on Web.Release.config and select Preview Transform:

config-transform-preview

One of the most common places where transforms come in handy is to change the value of appsettings when deploying. Let’s say, for example, that we have the following setting in our App.config or Web.config file:

  <appSettings>
    <add key="SomeFilePath" value="myfile.txt" />
  </appSettings>

In our transform file (which so far has been Web.Release.config), we could define the following transform:

  <appSettings>
    <add key="SomeFilePath" value="C:\data\productionfile.txt"
         xdt:Transform="Replace"
         xdt:Locator="Match(key)" />
  </appSettings>

This tells the transform engine to look for the matching key (in this case “SomeFilePath”), and replace it with what is specified in the transform file:

config-transform-preview2

Similarly, it is very common to replace database connection strings depending on the environment. In fact, the default Web.Debug.config and Web.Release.config (shown earlier) explain in comments how to do this.

Other than removing attributes and replacing values, transforms support various other operations which allow insertion and removal of nodes and attributes. See Web.config Transformation Syntax for Web Project Deployment Using Visual Studio (official documentation) for detail.

Despite the title of that article, transforms may also be used with App.config files (for non-web projects) and just about any arbitrary XML file (although not quite in the same way as shown above). For web applications, Visual Studio provides some support for creating and previewing transforms. For other application types and XML documents, it is necessary to resort to other tools.

Tools

One of the simplest ways to apply config transforms (in particular, the Release transform) is to Publish your web application. This will compile your web application in Release mode, apply config transforms, and put it in the destination you specify.

Unfortunately, this approach has several drawbacks:

  • It is only suitable for web applications (not e.g. console applications).
  • It can’t be used for arbitrary XML files.
  • It can’t be extended to various different environments.
  • Being embedded in Visual Studio, it can’t be automated or integrated with a continuous integration system.

In recent years, Microsoft released an XML Document Transformation (XDT) library that enabled a variety of tools to run these transforms. This is available as the Microsoft.Web.Xdt NuGet package, and the source code is available on CodePlex.

The Web.config Transformation Tester is a web application that is useful for testing transforms. It uses XDT internally (see its source code) and can transform web and application configs as well as arbitrary XML files.

Config Transformation Tool (ctt) or XDT Transformation Tool is a command-line utility launched in 2010 also based on Microsoft’s XDT library. Although its Codeplex site says it’s still Alpha, I’ve used it to transform Web.config, App.config and XML files.

Say we have this Web.config file:

<?xml version="1.0"?>
<configuration>
  <appsettings></appsettings>
  <connectionStrings>
    <add name="foo" connectionString="value"/>
  </connectionStrings>
  <system.web>
    <customErrors mode="Off"/>
  </system.web>
</configuration>

…and we have this transform file called Web.Debug.config:

<?xml version="1.0"?>
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
  <appsettings>
    <add key="EnvironmentName" value="Debug"
        xdt:Transform="Insert" />
  </appsettings>
  <connectionStrings>
    <add name="foo" connectionString="differentValue"
        xdt:Transform="Replace" xdt:Locator="Match(name)" />
  </connectionStrings>
</configuration>

Then we can run ctt as follows:

ctt s:Web.config t:Web.Debug.config d:Web.Out.config i

I’m using the shorthand parameters here:

  • s: marks the source file.
  • t: marks the transform file.
  • d: marks the destination file.
  • i causes the output to be indented, because by default it’s minified and unreadable.

The output is as follows:

<?xml version="1.0"?><configuration>
    <appsettings>
        <add key="EnvironmentName" value="Debug" />
    </appsettings>
    <connectionStrings>
        <add name="foo" connectionString="differentValue" />
    </connectionStrings>
    <system.web>
        <customErrors mode="Off" />
    </system.web>
</configuration>

This also works on XML files. In fact, I’ve taken Microsoft’s sample books.xml file and applied the following transform file (books.transform.xml):

<?xml version="1.0"?>
<catalog xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
   <book id="bk113" xdt:Transform="Insert">
      <author>David Kushner</author>
      <title>Masters of Doom</title>
      <genre>Non-Fiction</genre>
      <price>10.95</price>
      <publish_date>2003-01-01</publish_date>
      <description>How two guys created an empire and transformed pop culture</description>
   </book>
</catalog>

This command applies the transformation:

ctt s:books.xml t:books.transform.xml d:books.out.xml i

As you can see, the insertion transform actually works:

config-transform-ctt-xml

Additionally, as a command line tool, ctt is great to automate transforms as part of a deploy script (e.g. in a batch file).

Aside from ctt, XDT’s open sourcing announcement lists a couple of other tools that use Microsoft’s XDT library. One is a Web Config Transform Runner which I haven’t tried but which is presumably similar to ctt. The other is SlowCheetah, a Visual Studio Extension which allows transforms to be previewed and added. The author discontinued development on SlowCheetah (see also Github thread) out of frustration that it wouldn’t become a first class citizen in Visual Studio, which ironically is what happened because the preview shown at the beginning of this article was created with Visual Studio 2015 without SlowCheetah installed.

Finally, this article would not be complete without a mention of TeamCity. Automating transformation of configuration settings is only one step in what should be a complete continuous integration solution running automated builds. TeamCity is just that, and it has support for transforms.

Calculating String Similarity with Levenshtein Distance

The source code for this article is available at the Gigi Labs BitBucket repository.

Levenshtein distance or edit distance is a metric used to measure the similarity between two strings. It works by calculating the number of edits needed to go from the first string to the second. For example:

  • To go from kitten to mitten, we need to replace the first letter. That’s just one edit.
  • To go from barrel to bard, we need to remove the last two letters, and replace an ‘r’ with a ‘d’. That’s three edits.
  • To go from bled to bleeding, we need to insert an ‘e’ in the middle and “ing” at the end. That’s four edits.
  • Transposition of characters also a valid operation.

To perform this calculation in code, there’s a simple algorithm. It starts off by initialising a matrix and setting initial values for the top row and left column. Let’s see how this looks in code.

We’ll start off with a simple prompt for the two strings to be compared:

            // input

            Console.Write("Input first string: ");
            string s = Console.ReadLine().ToLowerInvariant();
            Console.Write("Input second string: ");
            string t = Console.ReadLine().ToLowerInvariant();

Next, we create our matrix and set initial values for the top row and left column:

            // init

            var matrix = new int[s.Length + 1, t.Length + 1];

            for (int i = 0; i < s.Length + 1; i++)
                matrix[i, 0] = i;

            for (int j = 0; j < t.Length + 1; j++)
                matrix[0, j] = j;

Rather than explaining what this does in words, it’s better if we just draw the matrix in the console window so you can see for yourself. To do that, we’ll need a helper function. This might look like a lot of code, but it’s basically just drawing the matrix with one string at the top and the other at the side:

        static void DrawMatrix(string s, string t, int[,] matrix)
        {
            Console.WriteLine();
            Console.Write("   ");

            for (int i = 0; i < t.Length; i++)
                Console.Write("{0}", t[i]);
            Console.WriteLine();

            for (int i = 0; i < s.Length + 1; i++)
            {
                Console.Write("{0} ", i > 0 ? s[i - 1] : ' ');

                for (int j = 0; j < t.Length + 1; j++)
                    Console.Write(matrix[i, j]);

                Console.WriteLine();
            }

            Console.WriteLine();
        }

Back in our Main() method, we can now call this helper function to draw the matrix:

            // visualise

            DrawMatrix(s, t, matrix);

Here’s the output:

levenshtein-init

As you can see, the initialisation code we wrote earlier is creating a matrix based on the length of the strings. There’s an extra row and column at the beginning, which we fill with incrementing values.

Now the heart of the Levenshtein distance algorithm is made up of a calculation that starts with those initial values and uses them to populate the rest of the matrix. For each cell, we look at the value to its left, top and top left. We adjust that in a specific way defined by the algorithm (either adding 1 or adding a cost, which is based on comparing a character from each string), and take their minimum. The resulting code looks something like this:

            // calculate

            for (int i = 1; i <= s.Length; i++)
            {
                for (int j = 1; j <= t.Length; j++)
                {
                    int cost = s[i - 1] == t[j - 1] ? 0 : 1;

                    int topPlus1 = matrix[i - 1, j] + 1;
                    int leftPlus1 = matrix[i, j - 1] + 1;
                    int topLeftPlusCost = matrix[i - 1, j - 1] + cost;

                    var min = Math.Min(topPlus1, leftPlus1);
                    min = Math.Min(min, topLeftPlusCost);
                    matrix[i, j] = min;
                }
            }

With that done, the Levenshtein distance is the value in the bottom right of the matrix:

            int levenshteinDistance = matrix[s.Length, t.Length];
            Console.WriteLine("Levenshtein distance = {0}", levenshteinDistance);
            Console.WriteLine();

Here’s the final output:

levenshtein-calculation

As you can see, Levenshtein Distance gives us a way to measure the similarity (or difference) between two strings. This lends itself well to various text-related applications such as validation, spell checking and error correction.

RabbitMQ String Headers Received As Byte Arrays

I ran into a strange issue today with RabbitMQ. When adding custom headers, I found that things like integers would get through to the other end just fine. But when it came to strings, they ended up as byte arrays in the receiving end.

To illustrate this issue, I’ve written a very simple example. The publisher, below, sets two custom headers: a string and an integer:

        static void Main(string[] args)
        {
            Console.Title = "Publisher";

            var factory = new ConnectionFactory();

            using (var connection = factory.CreateConnection())
            {
                using (var channel = connection.CreateModel())
                {
                    // prepare payload and headers

                    var body = Encoding.UTF8.GetBytes("Hello");
                    var props = channel.CreateBasicProperties();
                    props.Headers = new Dictionary<string, object>();
                    props.Headers["Name"] = "Bob";
                    props.Headers["Age"] = 21;

                    // set up queue and exchange

                    channel.QueueDeclare("testqueue", true, false, false, null);
                    channel.ExchangeDeclare("testexchange", "direct");
                    channel.QueueBind("testqueue", "testexchange", "");

                    // publish message

                    channel.BasicPublish("testexchange", "", props, body);

                    Console.ReadLine();
                }
            }
        }

As you can see, this is set correctly in the Headers collection:

rabbitmq-quirkystring-publish-debug

Now that we’ve published a message, we can consume it using the following code (practically the same as that in “Getting Started with RabbitMQ with .NET“, except that it also writes out the two custom headers):

        static void Main(string[] args)
        {
            Console.Title = "Consumer";

            var factory = new ConnectionFactory();

            using (var connection = factory.CreateConnection())
            {
                using (var channel = connection.CreateModel())
                {
                    channel.QueueDeclare("testqueue", true, false, false, null);

                    var consumer = new EventingBasicConsumer(channel);
                    consumer.Received += Consumer_Received;
                    channel.BasicConsume("testqueue", true, consumer);

                    Console.ReadLine();
                }
            }
        }

        private static void Consumer_Received(object sender, BasicDeliverEventArgs e)
        {
            var body = e.Body;
            var content = Encoding.UTF8.GetString(body);
            var name = e.BasicProperties.Headers["Name"];
            var age = e.BasicProperties.Headers["Age"];

            Console.WriteLine("{0} {1} {2}", name, age, content);
        }

The output, however, is not quite what one would expect:

rabbitmq-quirkystring-bytearray-output

In fact, what we received in the consumer is not a string but a byte array, even if the bytes correspond to what we actually sent:

rabbitmq-quirkystring-consume-debug

The integer header, however, did not have this problem.

A quick search showed that I’m not the first person to encounter this, as someone pointed out this odd behaviour back in 2012, and it appears there’s a similar issue in the Python implementation.

All I can suggest based on these two links is: if you have a header which you know is a string, just do the byte-to-string conversion yourself:

        private static void Consumer_Received(object sender, BasicDeliverEventArgs e)
        {
            var body = e.Body;
            var content = Encoding.UTF8.GetString(body);
            var nameBytes = (byte[]) e.BasicProperties.Headers["Name"];
            var name = Encoding.UTF8.GetString(nameBytes);
            var age = e.BasicProperties.Headers["Age"];

            Console.WriteLine("{0} {1} {2}", name, age, content);
        }

As you would expect, this sorts out the problem:

rabbitmq-quirkystring-string-output

How to use the C++ STL Priority Queue

This article was originally posted at Gigi’s Computer Corner on 19th January 2013.

Overview

A priority queue is a queue data structure that has the particular property of being sorted by priority. You can decide what priority to give items depending on the data type – more on this in a minute.

The C++ Standard Template Library (STL) includes a convenient std::priority_queue class template in the queue header file.

A simple priority queue of integers

The following code sample illustrates how to implement a priority queue of integers. The integer value is used by default as a priority. The queue is sorted automatically as new entries are added.

#include <iostream>
#include <queue>

int main(int argc, char ** argv)
{
	std::priority_queue<int> queue;

	queue.push(100);
	queue.push(300);
	queue.push(50);
	queue.push(150);

	while (!queue.empty())
	{
		std::cout << queue.top() << std::endl;
		queue.pop();
	}

	system("pause");

	return 0;
}

The output of the above program is as follows:

300
150
100
50
Press any key to continue . . .

A priority queue with a custom class

In practical situations, it is often not very useful to simply maintain a priority queue of just integers. We often have some particular class, and we want to give each instance a priority (computed based on some internal state).

Let’s say we have a class called Toast, composed of a certain amount of bread and butter:

class Toast
{
public:
	int bread;
	int butter;

	Toast(int bread, int butter)
		: bread(bread), butter(butter)
	{

	}
};

It is easy to sort integers, but how do you sort Toast? We need to offer C++ a way to compare one Toast instance to another. This is done by creating a structure implementing an operator() and effectively doing a less-than comparison. A StackOverflow question and answer shows how it’s done, and the code needed for sorting our Toast is below.

struct ToastCompare
{
	bool operator()(const Toast &t1, const Toast &t2) const
	{
		int t1value = t1.bread * 1000 + t1.butter;
		int t2value = t2.bread * 1000 + t2.butter;
		return t1value < t2value;
	}
};

We can now pass the TestCompare class to the priority_queue to tell it how to sort the Toast. Sample code is below.

#include <iostream>
#include <queue>
#include <vector>

#include "Toast.h"

using std::priority_queue;
using std::vector;
using std::cout;
using std::endl;

int main(int argc, char ** argv)
{
	Toast toast1(2, 200);
	Toast toast2(1, 30);
	Toast toast3(1, 10);
	Toast toast4(3, 1);
	
	//priority_queue<Toast> queue;
	priority_queue<Toast, vector<Toast>, ToastCompare> queue;

	queue.push(toast1);
	queue.push(toast2);
	queue.push(toast3);
	queue.push(toast4);

	while (!queue.empty())
	{
		Toast t = queue.top();
		cout << "bread " << t.bread << " butter " << t.butter << std::endl;
		queue.pop();
	}

	system("pause");

	return 0;
}

If we used the simple priority_queue declaration (the line that is commented out), we would end up with a bunch of errors because of C++ not knowing how to compare theToast instances.

Instead, we pass three template arguments: the Toast itself, a vector of Toast, and theToastCompare class to tell C++ how to compare Toast instances. The second template argument (the vector) is there because the C++ STL priority queue is actually a container adapter – it uses an underlying data structure to store elements, and the default is a vector.

The output for the above program is given below:

bread 3 butter 1
bread 2 butter 200
bread 1 butter 30
bread 1 butter 10
Press any key to continue . . .

Further reading

Scope Bound Resource Management in C#

This article explains how we can use scope to localize resource lifetime as well as other arbitrary user code effects within a method. The article starts with background from C++, because that’s where this technique comes from.

Update 5th November 2016: Some of the utility classes described here have been incorporated in my Dandago.Utilities NuGet package.

The RAII Pattern

Despite their power, pointers in C/C++ have been the cause of much outrage. Using them incorrectly typically leads to disastrous effects, from memory leaks to segmentation faults. While newer languages such as Java and C# have imposed damage control by taking control of the lifetime of objects allocated on the heap (via garbage collection), there are techniques even in C++ that make pointers a lot less messy to use.

In fact, the problem here is not even about pointers. Pointers belong to a family of resources, along with file handles, sockets, and many other things. These are typically allocated on the heap and must be released after use; otherwise bad things happen.

Scott Meyers’ excellent book Effective C++: 55 Specific Ways to Improve Your Programs and Designs has an entire section dedicated to safely working with resources. One of the first things he suggests in that section is to encapsulate a resource within an object. For example:

class MyResource
{
public:
    MyResource() // constructor
    {
        this->ptr = new int[1000]; // allocate memory on heap
    }

    ~MyResource() // destructor
    {
        delete[] this->ptr; // free memory
    }

private:
    int * ptr; // pointer encapsulated within class
};

This encapsulation is called Resource Acquisition Is Initialization (RAII), or Scope-Bound Resource Management. Apart from constructors, C++ classes can have destructors. These get called either explicitly from application code, or automatically when an object allocated on the stack goes out of scope. Let’s see how this works in practice:

int main(int argc, char ** argv)
{
    // create instance of MyResource on stack
    MyResource resource;

    return 0;
} // we went out of scope; destructor gets called

Just like declaring an int variable allocates it on the stack, the same thing happens with class types. When that object goes out of scope (i.e. reaches the next } brace), its destructor gets called. This is great because you can’t really forget to dispose of the encapsulated resource. If you return early, throw an exception, etc, the destructor will get called when control leaves the current execution block.

The IDisposable Pattern

In C#, objects allocated on the heap (via the new keyword) are typically killed by the garbage collector when it determines that they are no longer in use. While C# code typically doesn’t face the problems C++ code has with pointers, managing resources is no less important. Just like C++, C# has to deal with file handles, databases, sockets, unmanaged libraries and other stuff that must be disposed as carefully as they are initialized.

For this reason, C# provides two tools to essentially do the work of C++ destructors: the IDisposable pattern, and finalizers. Using these correctly is non-trivial and depends on the situation, but they are also pretty standard and well-documented. Check out this CodeProject article for an in-depth treatment of the topic.

For convenience, C# also provides an overloaded using keyword that works hand-in-hand with the IDisposable pattern:

            using (var fs = File.OpenWrite("file.txt"))
            using (var sw = new StreamWriter(fs))
            {
                sw.WriteLine("Hello!");
            } // automatically calls Dispose() for both

A using block, wrapping an object that implements IDisposable, will automatically call that object’s Dispose() method when it goes out of scope (and using blocks can be stacked, as above). This is essentially equivapent to:

            FileStream fs = null;
            StreamWriter sw = null;
            try
            {
                fs = File.OpenWrite("file.txt");
                sw = new StreamWriter(fs);

                sw.WriteLine("Hello!");
            }
            finally
            {
                sw.Dispose();
                fs.Dispose();
            }

Abusing the IDisposable Pattern

While IDisposable is meant to be used to safely dispose of resources, we can extend its usefulness to other scenarios that relate to a particular scope.

One example where I’ve used this idea before is to log the beginning and end of a method:

    public class ScopedLog : IDisposable
    {
        private string name;

        public ScopedLog(string name)
        {
            this.name = name;
            Console.WriteLine("Begin {0}", name);
        }

        public void Dispose()
        {
            Console.WriteLine("End {0}", this.name);
        }
    }

This pattern doesn’t really add anything, but gives you an elegant way to do scope-related stuff without that getting in the way of your application logic:

        static void Main(string[] args)
        {
            using (var log = new ScopedLog("Main"))
            {
                Console.WriteLine("Hello!");
            }

            Console.ReadLine();
        }

Here’s the output:

scope-logging

Another example is when you want to benchmark your code. Code can get really messy when you’re doing logging, benchmarking and other stuff in the same method. Instead, we can make a dedicated class:

    public class ScopedTimer : IDisposable
    {
        private string name;
        private DateTime startTime;

        public ScopedTimer(string name)
        {
            this.name = name;
            this.startTime = DateTime.Now;
        }

        public void Dispose()
        {
            var endTime = DateTime.Now;
            var elapsed = endTime - this.startTime;

            Console.WriteLine("{0} took {1}", this.name, elapsed);
        }
    }

…and then put it neatly in a using block:

        static void Main(string[] args)
        {
            using (var timer = new ScopedTimer("Main"))
            using (var log = new ScopedLog("Main"))
            {
                Console.WriteLine("Hello!");
            }

            Console.ReadLine();
        }

Here’s the output:

scope-benchmarking

A final example is changing the console colour. It’s very easy to save the old colour, set a new one, and then revert back to the old colour when going out of scope:

    public class ScopedConsoleColour : IDisposable
    {
        private ConsoleColor oldColour;

        public ScopedConsoleColour(ConsoleColor newColour)
        {
            this.oldColour = Console.ForegroundColor;

            Console.ForegroundColor = newColour;
        }

        public void Dispose()
        {
            Console.ForegroundColor = this.oldColour;
        }
    }

Note: you could use Console.ResetColor(), but then you can’t nest colour changes.

Here’s the updated Main() code for this example:

        static void Main(string[] args)
        {
            using (var timer = new ScopedTimer("Main"))
            using (var log = new ScopedLog("Main"))
            {
                Console.WriteLine("Hello!");

                using (var colour = new ScopedConsoleColour(ConsoleColor.Yellow))
                {
                    Console.WriteLine("Howdy?");
                }

                Console.WriteLine("Bye!");
            }

            Console.ReadLine();
        }

Here’s the output:

scope-colour

Other Applications in C++

The RAII pattern is awesome. Not only does it allow you to safely manage the lifetime of resources, but it enables a whole class of scope-based applications. Aside from the C# examples in the previous section, RAII enables things like smart pointers in C++ (e.g. unique_ptr) and scoped locks.