ADO .NET Part 3: Useful Techniques

Most of what you need to know in order to query a database using ADO .NET has already been covered in Part 1 and Part 2. However, in the interest of keeping things concise, we’ve deferred covering some very important topics… until now. In Part 3, we’re going to talk about:

  • SQL transactions
  • Prepared statements
  • Different ADO .NET data providers and their commonalities

Travel Example

For this article, we’ll work with a new example revolving around travel destinations. We’ll ignore some best practices of database design to keep it simple.

We have a table of cities, such as Dublin, Ireland or Rome, Italy. Each of these can be assigned any number of tags such as Beach or History. The relationship between City and Tag is many-to-many, so we have a CityTag table in between.

After creating a new database called Travel, run the following script to create the tables and populate them with some data:

create table City (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null
);

create table Tag (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null
);

create table CityTag (
    Id int not null primary key identity(1, 1),
    CityId int not null,
    TagId int not null,
);

insert into City (Name) values
    ('Dublin, Ireland'),
    ('Rome, Italy'),
    ('Vienna, Austria')

insert into Tag (Name) values
    ('Beach'),
    ('Forest Trails'),
    ('History'),
    ('Mountains'),
    ('Great Public Transport');

insert into CityTag (CityId, TagId) values
    (1, 2),
    (1, 3),
    (2, 3),
    (3, 3),
    (3, 5);

Transactions

Imagine someone is adding a new city from a user interface:

They can enter the name of the city, select the tags, and hit the Add button to save the data they entered to the database. However, when this happens, we are actually doing two operations: adding a new row to the City table, and adding multiple rows to the CityTag table.

In ADO .NET, the code for this could look something like this:

        static async Task<bool> AddCityAsync()
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var queries = new string[]
            {
                "insert into City (Name) values ('Rimini, Italy');",
                "insert into CityTag (CityId, TagId) values (SCOPE_IDENTITY(), 1)"
            };

            try
            {
                using (var conn = new SqlConnection(connStr))
                {
                    await conn.OpenAsync();

                    using (var command = new SqlCommand())
                    {
                        command.Connection = conn;

                        foreach (var query in queries)
                        {
                            command.CommandText = query;
                            await command.ExecuteNonQueryAsync();
                        }
                    }
                }

                return true; // indicates that it was successful
            }
            catch (Exception)
            {
                return false; // indicates that something went wrong
            }
        }

What are we doing here? Well…

  1. We’re opening a connection the same way we did before, only that we’re connecting to the Travel database.
  2. We prepare the two separate queries representing the two operations of adding a city and adding tags. Most of the values are hardcoded, except that we’re using SCOPE_IDENTITY() in one place. This takes the last ID that was inserted, so if the new city has an ID of 4, then the CityTag row will use 4 for the CityId column.
  3. The queries are executed in a loop. Note how we are reusing the same SqlCommand object for both queries. That’s perfectly fine, as long as we update the query every time we use it.
  4. A try/catch will cause the method to return false if an exception occurs, indicating a problem to the calling code. On the other hand, the method returns true if everything’s ok.

If the second operation (inserting the tags) were to fail, we’d end up with a city without tags, (or a set of tags without a city, if they were executed in reverse order). That’s a pretty embarassing situation for the database to end up in, not to mention the application itself… misleading the user into thinking that what they entered was successfully saved, when in fact data was lost in the process.

Fortunately, we can use transactions to ensure that the operations either all succeed (woohoo) or all fail (display an error to the user so that they can try again). We do that by calling BeginTransaction() on the SqlConnection object, which gives us back an SqlTransaction object:

        static async Task<bool> AddCityAsync()
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var queries = new string[]
            {
                "insert into City (Name) values ('Amsterdam, Netherlands');",
                "insert into CityTag (CityId, TagId) values (SCOPE_IDENTITY(), 1)"
            };

            try
            {
                using (var conn = new SqlConnection(connStr))
                {
                    await conn.OpenAsync();
                    var tran = await conn.BeginTransactionAsync();

                    try
                    {
                        using (var command = new SqlCommand())
                        {
                            command.Connection = conn;
                            command.Transaction = tran as SqlTransaction;

                            foreach (var query in queries)
                            {
                                command.CommandText = query;
                                await command.ExecuteNonQueryAsync();
                            }
                        }

                        await tran.CommitAsync();
                        return true; // indicates that it was successful
                    }
                    catch (Exception)
                    {
                        await tran.RollbackAsync();
                        return false; // indicates that something went wrong
                    }
                    finally
                    {
                        tran.Dispose();
                    }
                }
            }
            catch (Exception)
            {
                return false; // indicates that something went wrong
            }
        }

An SqlTransaction implements IDisposable, which means it could go into a using block. However, given that we’re using a try/catch anyway to determine whether we should commit or rollback, I’ve opted to use a classic try/catch/finally instead, in order to limit the amount of nesting. (I could probably have done the same for the SqlConnection, by the way.) Note also that we need the inner try/catch because an exception could also be thrown when opening the connetion or creating a transaction, and these need to be handled by the outer try/catch.

While preparing this code, I noticed that all the Sql* classes now also have a DisposeAsync() method. I’m not sure at this stage why you would want to call this instead of just Dispose(), although presumably it could avoid a little blocking just like other methods that have sync and async versions.

Another thing to note is that we need to assign the transaction to the SqlCommand object, either via the latter’s constructor, or using its Transaction property. The Transaction property is of type DbTransaction (we’ll talk about this later), not SqlTransaction, so we had to cast the transaction object.

Multiple statements executed within a transaction will either collectively succeed or collectively fail, avoiding the issue of ending up with partial data.

Prepared Statements

Imagine we want to provide the ability to search for cities by name. A user enters part of the name of the city, and the matching results are returned. Using what we learned in Part 2, this could be implemented as follows:

        static async Task Main(string[] args)
        {
            Console.Write("Enter query: ");
            string searchQuery = Console.ReadLine();

            var results = await SearchCitiesAsync(searchQuery);

            Console.WriteLine("Results:");

            foreach (var result in results)
                Console.WriteLine(result);

            Console.WriteLine("Done. Press ENTER to exit...");
            Console.ReadLine();
        }

        static async Task<List<string>> SearchCitiesAsync(string searchQuery)
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var results = new List<string>();

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                string query = $@"select Name
                                  from City
                                  where Name like '%{searchQuery}%'";

                using (var command = new SqlCommand(query, conn))
                using (var reader = await command.ExecuteReaderAsync())
                {
                    while (await reader.ReadAsync())
                    {
                        string name = reader["Name"].ToString();
                        results.Add(name);
                    }
                }
            }

            return results;
        }

We are taking user input in Main(), passing it to the SearchCitiesAsync() method, collecting the results in a list, and then returning them back. This is closer to what a real repository method might look like, than the examples we saw in Part 2. Note also how we’re directly embedding the user input into the SQL query by means of string interpolation. An example run might have the following output:

Enter query: Ital
Results:
Rome, Italy
Rimini, Italy
Done. Press ENTER to exit...

That looks correct. However, a malicious user might try the following:

Enter query: ' OR 1=1 --
Results:
Dublin, Ireland
Rome, Italy
Vienna, Austria
Rimini, Italy
Amsterdam, Netherlands
Done. Press ENTER to exit...

By entering specially crafted input, we actually got the program to return all results in the table. That is called SQL injection, and takes advantage of unsanitised input to do things like bypass login systems (see “Bypassing a Login Form using SQL Injection“) or even drop tables. That’s pretty dangerous.

To counter this threat, virtually all languages that offer database access provide a feature called prepared statements. This is a way to create placeholder variables in the SQL query, and separately provide parameters that represent the input values. In ADO .NET, this can be done as follows:

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                string query = $@"select Name
                                  from City
                                  where Name like @searchQuery";

                using (var command = new SqlCommand(query, conn))
                {
                    var param = new SqlParameter();
                    param.ParameterName = "@searchQuery";
                    param.SqlDbType = System.Data.SqlDbType.NVarChar;
                    param.Value = $"%{searchQuery}%";

                    command.Parameters.Add(param);

                    using (var reader = await command.ExecuteReaderAsync())
                    {
                        while (await reader.ReadAsync())
                        {
                            string name = reader["Name"].ToString();
                            results.Add(name);
                        }
                    }
                }
            }

Instead of using string concatenation directly, we’re putting a variable named @searchQuery in the SQL query. We then create an SqlParameter with the same name, the data type of the corresponding column, and the value we want to pass in (note that the percentages in the value are necessary only because we’re using an SQL LIKE clause, and are not otherwise normally present). ADO .NET is smart enough to structure the query correctly (e.g. adding apostrophes for strings) depending on the data type. A malicious user trying the same trick against this codebase will be sorely disappointed:

Enter query: ' OR 1=1 --
Results:
Done. Press ENTER to exit...

Now, doing all that preparation for each parameter looks tedious, and it is. Fortunately, there are ways to write the same code more concisely:

  1. Instead of setting properties, use SqlParameter‘s constructor to pass the required values.
  2. Use AddWithValue(), e.g. command.Parameters.AddWithValue("@searchQuery", $"%{searchQuery}%");. This is great because you don’t even need to supply the data type, which is inferred automatically. While this method has drawn criticism because the data type inference isn’t perfect, I don’t think it’s right to vilify a useful option just because there are edge cases, especially if these might be triggered by bad database design.
  3. Use Dapper. We’ll talk about this in Part 4.

In ADO .NET, using prepared statements instead of plain concatenation (which includes the more modern string interpolation, as we have seen) is essential for security. While it is nowadays considered a basic technique, you’ll be surprised how many codebases are still vunlerable to SQL injection. This is based not only on my own experience, but also on the OWASP Top Ten, where injection attacks are on top of the list year after year.

In Object-Relational Mappers (ORMs) such as Entity Framework, prepared statements are used under the hood and abstracted away, so you don’t need to worry about any of this.

Data Providers

So far, we’ve used a lot of classes that start with Sql, such as:

  • SqlConnection
  • SqlCommand
  • SqlDataReader
  • SqlTransaction
  • SqlParameter

This is, of course, not by coincidence. These are all ADO .NET classes relating to SQL Server. ADO .NET implementations for other database engines follow a similar convention but have a different prefix. For instance, if we were dealing with MySQL, then we would have MySqlConnection, MySqlCommand, etc. For Oracle they would instead be OracleConnection, OracleCommand, etc.

Aside from the naming convention, similar classes across different data providers share common base classes and interfaces, found in the System.Data namespace. For instance, SqlConnection and MySqlConnection both inherit from DbConnection, which in turn implements IDbConnection. If you were to write your own ADO .NET interface for some new database, you would have to subclass these base types and implement the relevant interfaces.

This abstraction serves a dual purpose. Aside from having a common interface to run database code across multiple engines, there is actually something called DbProviderFactory that you can use to create connections, commands, and other ADO .NET types based on configuration provided at runtime. This means you could be running queries against SQL Server, then change the configuration, run the application again, and run them against MySQL instead. Each abstract type also has ways of creating other types from it (e.g. DbConnection.CreateCommand()).

While DbProviderFactory offers a great deal of flexibility in terms of being able to use the same queries against different database engines, it is rarely used in practice. First, the need to swap the underlying database is something that occurs much less frequently than some people like to claim. Secondly, this flexibility comes at the cost of having to use the lowest common denominator of features across all the supported engines, and not being able to use database-specific features. Thus, the database repository layer is usually locked into a specific database vendor, but abstracted enough that it can be replaced with a different implementation if the need ever arises.

Having said that, there are cases where DbProviderFactory makes sense. For intance, if you build a product that allows customers to decide which database engine to use as a backing store (for instance, Microsoft Orleans), then you can either build a specific repository for each supported database, or use a single repository that works across all supported databases, and choose the ADO .NET provider at runtime. Both are valid options, and the best option depends on the needs of the application.

Summary and Next Steps

Part 2 taught us how to query a database using ADO .NET, but was not enough to allow us to write production-grade code. In this article, we used a travel-related example to talk about transactions and prepared statements, allowing us to do ACID-compliant updates and avoid SQL injection, respectively.

We also discussed the structure of ADO .NET itself, and how data providers for different database engines adhere to common conventions, inherit from the same base classes, and implement the same interfaces. The data provider itself could be seleted at runtime based on configuration, although it is useful to know the tradeoffs when doing so.

In the fourth and final part of this mini-series, we’ll explore Dapper, which simplifies how we work with ADO .NET and reduces much of the common boilerplate code.

2 thoughts on “ADO .NET Part 3: Useful Techniques”

Leave a Reply

Your email address will not be published. Required fields are marked *