Category Archives: Software development

Pitfalls of AutoMapper

April 18, 2021 Gigi 3 Comments

AutoMapper is a very popular .NET library for automatically mapping objects (e.g. database model to DTO). The mapping is done without having to write code to explicitly copy values over (except in cases where the mapping isn’t obvious, such as where property names don’t match or where transformations are necessary).

Surely, a library with over 144 million downloads on NuGet that lets you write less code is nothing but awesome, right? And yet, AutoMapper is one of four libraries that caused me nothing but grief, along with MediatR (by the same author), Entity Framework, and Fluent Validations.

When writing code, there are four main things I value:

Validity: the code does what it’s supposed to do without defects.
Performance: the code isn’t inefficient in ways that impact the application.
Maintainability: the code is easy to understand, reason about and debug. This point is entirely pragmatic, as opposed to the “clean code” camp that idolises code for its own sake.
Timeliness: the chosen approach should enable the solution to be delivered in a timely manner.

Thus, as I dissect AutoMapper in this article, I will do so in terms of these points.

Compile-Time vs Runtime

In my 2017 article “The Adapter Design Patterns for DTOs in C#“, I briefly mentioned AutoMapper and listed three reasons why I wasn’t a fan. The first of these was:

“The mapping occurs dynamically at runtime. Thus, as the DTOs evolve, you will potentially have runtime errors in production. I prefer to catch such issues at compile-time.”
— The Adapter Design Patterns for DTOs in C#

For me, this is still one of the most important reasons to avoid AutoMapper. The whole point of using a statically typed language is that you can avoid costly runtime problems by catching errors at compile-time. Yet, using AutoMapper bypasses the compiler entirely, and compares objects dynamically at runtime.

This inevitably leads to bugs as the software evolves. Software Development Sucks‘ 2015 article “Why Automapping is bad for you” (hereinafter referred to as just “the SDS article”) has a very good example illustrating how simply renaming a property will cause the mapping to silently break (i.e. a field will go missing) without the compiler ever complaining.

Mapping as Logic

In recent years, declarative forms of programming have become a lot more popular, perhaps due to the growing influence of functional programming. In .NET libraries, this approach tends to manifest itself in the form of fluent syntax, which you can find in ASP .NET Core pipeline configuration, AutoMapper mapping configuration, and Fluent Validation among many other places.

However, mapping is actually logic, and you can’t treat it like declarative configuration. How do you debug it? How do you understand what it’s doing when complex object graphs are involved? That’s right, you can’t. Mapping needs to happen in code that you can reason about and debug, just like any other logic.

Also, teams that use AutoMapper normally end up with heaps of these declarative mapping rules, which defeats the purpose of not writing mapping code in the first place. While I can understand that there will be less properties to map manually, it’s not like it takes hours to write a few extra lines of code to map a few properties.

External Dependencies

The biggest problems I’ve seen with the use of AutoMapper relate to when certain properties are computed based on external dependencies, such as getting something from an API or database, or some expensive computation. To be fair, these problems are not intrinsic to AutoMapper at all, but they are still a valid part of this discussion.

Let’s say we have a UserId property that needs to be mapped to a full User (object) property in a destination object. This would require asynchronously requesting the User object from a database (or cache, or API, etc). However, the first obstacle we face is that AutoMapper doesn’t support async resolvers, and never will.

So how do we map something that requires an async call? Well, the first way I saw people do this is by running the async code synchronously, calling the .Result property. As I hope you’re aware, this is a terrible idea that can (and did) result in deadlocks (see “Common Mistakes in Asynchronous Programming with .NET“).

But there’s more to it than that. If you have any expensive call, async or otherwise, you should probably be doing it in batches rather than for each object you map (see “Avoid await in Foreach“), because it makes a huge difference in performance.

So, it’s actually a good thing that AutoMapper doesn’t support async mapping operations, because they shouldn’t be done as part of the DTO mapping at all, whether you’re using AutoMapper or doing it manually. It violates both the Single Responsibility Principle (mapping code doing more than just mapping) and Dependency Inversion (you’re hiding dependencies within your mapping code, possibly by using a Service Locator… how do you write unit tests for that?), and it also has very serious stability and performance consequences as mentioned above.

Reflection

Libraries that do runtime magic like AutoMapper or inversion of control (IoC) containers use Reflection, which by its very nature is slow. While this is not likely to have an impact in most cases, and therefore we should not dismiss such tools before ascertaining that they have a measurable impact, it is still useful to be aware of this.

AutoMapper may result in performance costs both at startup (when mapping configurations are set up) and during processing (when actually mapping happens), and these costs might not be negligible for larger projects. As a result, it is good to measure these costs to ensure they don’t get out of hand.

So What Should We Use Instead?

As it turns out, the C# language is perfectly capable of doing DTO mapping, and no library is actually necessary (the same argument applies to Fluent Validations, by the way).

The SDS article suggests simply using methods for mapping, and goes on to list the benefits in terms of refactoring, testing, performance and code quality. My own “The Adapter Design Patterns for DTOs in C#” takes this a step further and suggests using extension methods as an elegant, decoupled solution.

Productivity

You might have noticed that I’ve talked a lot about Validity, Performance and Maintainability, but I haven’t really mentioned Timeliness at all. Surely AutoMapper is a time-saver, as we have to write a lot less code? Well, no, actually.

The SDS article compares manual mapping to AutoMapper, and concludes that the latter actually requires a lot more effort (it’s not all about lines of code). I also argued in “The Adapter Design Patterns for DTOs in C#” that “writing AutoMapper configuration can get very tedious and unreadable, often more so than it would take to map DTOs’ properties manually”.

But beyond that, as I mentioned earlier, it actually doesn’t take a lot of time to write a few lines of code to map properties from one object to another. In fact, in practice, I’ve spent way more time fixing problems resulting from AutoMapper, or in discussions about all the points I’ve written in this article.

The only time when automatic mapping can really save time is when you have hundreds of DTOs and you don’t want to manually map them one by one. This is a one-time job, and should be done by code generation, not runtime mapping. Code generation has been around for a long time, and is given particular prominence in “The Pragmatic Programmer” by Andrew Hunt and David Thomas.

In the .NET world, code generation was most famously used by the Entity Framework, when the old Database First approach used T4 templates to generate model classes based on database tables. While T4 templates never really became popular, and they are no longer supported by .NET Core, it is not a big deal to write some code to read classes from a DLL via Reflection and spit out mapping code based on their properties. In fact, I wouldn’t be surprised if something that does this already exists.

Update 22nd April 2021: It does! Cezary Piątek developed MappingGenerator, a Visual Studio plugin that quickly generates mapping logic from one DTO to another.

Code generation is preferable over runtime automapping because it’s done as a one-time, offline process. It does not affect build time or startup time, does not cause problems at runtime (the generated code still has to be compiled), and does not need additional libraries or dependencies within the project. The generated code can easily be read, debugged and tested. And if there are any properties that require custom logic that can’t be generated as part of this bulk process, you just take the generated code and manually add to it (which you would have done anyway with AutoMapper).

Conclusion

Laziness is often promoted as a good thing in software developers, because it makes them automate things. However, it’s not okay to automate things in a way that sacrifices validity, performance, maintainability or timeliness.

Gathering Net Salary Data with Puppeteer

March 9, 2021 Gigi 1 Comment

Tax is one of those things that makes moving to a different country difficult, because it varies wildly between countries. How much do you need to earn in that country to maintain the same standard of living?

You can, of course, use an online salary calculator to understand how much net salary you’re left with after deducting tax and social security contributions, but this only lets you sample specific salaries and doesn’t really give you enough information to assess how the impact of tax changes as you earn more. Importantly, you can’t use these tools to draw a graph for each country and compare.

Malta Salary Calculator by Darren Scerri

Fortunately, however, these tools have already done the heavy lifting by taking care of the complex calculations. To build a graph, all we really need to do is to take samples at regular intervals, say, every 1,000 Euros. Since that is very tedious to do by hand, we’ll use a browser automation tool to do this for us.

Enter Puppeteer

Puppeteer, as the homepage says, “is a Node library which provides a high-level API to control Chrome or Chromium”, which is pretty much what we need for this job. It also gives us what we need to get started. In a new folder, run the following to install the puppeteer dependency:

npm i puppeteer

Then, create a new file (e.g. netsalary.js) and add the starter code from the Puppeteer homepage. We’ll use this as a starting point:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

Getting Salary Data for Malta

In this particular exercise, we’ll get the salary data for Malta using Darren Scerri’s Malta Salary Calculator, which is relatively easy to work with.

Before we write any code, we need to understand the dynamics of the calculator. We do this via the browser’s developer tools.

Whenever you change the value of the gross salary input field (that has the “salary” id in the HTML), a bunch of numbers get updated, including the yearly net salary (which has the “net-yearly-result” class) which is what we’re interested in.

Just by knowing how we can reach the relevant elements, we can write our first code to retrieve the input (gross salary) and output (yearly net salary) values to make sure we know what we’re doing:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  // Gross salary
  const grossSalaryInput = await page.$("#salary");
  const grossSalary = await page.evaluate(element => element.value, grossSalaryInput);
  console.log('Gross salary: ', grossSalary);
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})();

Here, we’re using the page.$() function to locate an element the same way we would using jQuery. Then we use the page.evaluate() function to get something from that element (in this case, the value of the input field). We do the same for the net salary, with the notable difference that in the page.evaluate() function, we get the textContent property of the element instead.

If we run this (node netsalary.js), we should get the same default values we see in the online salary calculator:

We managed to retrieve the gross and net salaries from the online calculator.

Text Entry

That was easy enough, but it used the default values that are present when the page is loaded. How do we manipulate the input field so that we can enter arbitrary gross salary values and later pick up the computed net salary?

The simplest way to do this is by simulating keyboard input as follows:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  const grossSalary = 30000;
  
  // Gross salary - keyboard input
  await page.focus("#salary");
  
  for (var i = 0; i < 6; i++)
    await page.keyboard.press('Backspace');
  
  await page.keyboard.type(grossSalary.toString());
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})();

Here, we:

Focus the input field, so that whatever we type goes in there.
Press backspace six times to erase any existing gross salary in the field (if you check the online calculator, you’ll see it can take up to six digits).
Type in the string version of our gross salary, which is a hardcoded constant with a value of 30,000.

The result I get when I run this matches what the online calculator gives me. I guess I must be doing something right for once in my life.

Net salary:  22,805.44

Pulling Net Salary Data in a Range

So now we know how to enter a gross salary and read out the corresponding net salary. How do we do this at regular intervals within a range (e.g. every 1,000 Euros between 15,000 and 140,000)? Easy. We write a loop.

In practice, there’s a little timing issue between iterations, so I also needed to nick a very handy sleep function off Stack Overflow and put a very short delay after doing the keyboard input, to give it time to update the output values.

const puppeteer = require('puppeteer');

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  console.log('Gross Net');
  
  for (var grossSalary = 15000; grossSalary <= 140000; grossSalary += 1000) {
    // Gross salary - keyboard input
    await page.focus("#salary");
  
    for (var i = 0; i < 6; i++)
      await page.keyboard.press('Backspace');
  
    await page.keyboard.type(grossSalary.toString());
    await sleep(10);
  
    // Net salary
    const netSalaryElement = await page.$('.net-yearly-result');
    const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);

    console.log(grossSalary, netSalary);
  }

  await browser.close();
})();

This has the effect of outputting a pair of headings (“Gross Net”) followed by gross and net salary pairs:

Outputting the gross and net salaries in steps of 1,000 Euros (gross) at a time.

Making a Graph

Now that we have a program that spits out pairs of gross and net salaries, we can make a graph out of this data. First, we dump all this into a file.

node netsalary.js > malta.csv

Although this is technically not really CSV data, it’s still very easy to open in spreadsheet software. For instance, when you open this file using LibreOffice Calc, you get the Text Import screen where you can choose to use space as the separator. This makes things easier given that the net salaries contain commas.

Choose Space as the separator to load the data correctly.

Once the data is in a spreadsheet, producing a chart is a relatively simple matter:

Graph showing how net salary changes with gross salary in Malta.

Now, this graph might look a little lonely, but you can already gather interesting insight by noticing its gradient and the fact that it isn’t entirely straight.

After doing this exercise for multiple countries, it’s fascinating to see how their lines compare when plotted on the same chart.

Aside from the allure of data analysis, I hope this article served to show how easy it is to use Puppeteer to perform simple browser automation, beyond the obvious UI automation testing.

Software development

Filter List As You Type with React

September 20, 2020 Gigi 6 Comments

A common piece of functionality in many user interfaces is to allow users to filter a list interactively by typing into a text field. In fact, I wrote an article showing how to do this in WPF almost seven years ago.

I’m currently learning React, and I feel this is a good exercise to get the hang of several basic concepts. I am sharing this in case it helps anyone, but my React knowledge is quite limited so I don’t expect anyone to take this as some kind of best practice. I welcome feedback on any possible improvements.

Although this article is quite basic, it covers several topics including controlled components, state manipulation, and keys. I’m not getting into the details of one-way binding and JSX, and just assuming you’re already familiar with them.

Preparing the React Application

The first thing to do is create a new React application. Simply follow the instructions in “Getting Started with React“.

Remove everything from src/App.css, and remove the <header> element from src/App.js as well as the logo import so that you are left with just this:

import React from 'react';
import './App.css';

function App() {
  return (
    <div className="App">

    </div>
  );
}

export default App;

If you’re using Visual Studio Code, you can use Ctrl+` (Control backtick) to bring up an integrated terminal. Either way, run npm start from a terminal. You should see an empty page because we just removed everything from it.

Showing a List of Fruit

If we’re going to filter a list, the first thing we need is to show a list. This is easy enough to achieve:

function App() {

  const fruit = ['apple', 'banana', 'orange', 'grapefruit',
    'mango', 'strawberry', 'peach', 'apricot'];

  return (
    <div className="App">
      <ul>
      {fruit.map(f => <li>{f}</li>)}
      </ul>
    </div>
  );
}

We’ve just got an array of strings representing different fruit, and we’re using the JavaScript map() function to render each item within a list.

The list is rendered, but we get a warning about a missing key.

If you save the file, the browser should automatically reload and display the list of fruit as shown above. However, if you open the browser’s developer tools, you’ll notice a warning about some missing key.

When rendering a list of items, React needs each item to be given a unique key to keep track of changes and know when it needs to re-render. This is done by adding a key attribute and binding it to something, as shown below.

      {fruit.map(f => <li key={f}>{f}</li>)}

In our case, we can simply use the name of the fruit itself, but typically you will want to use a unique ID rather than the display string.

State and Controlled Components

The next thing we need is to take input from a text field. We can show a text field by simply adding it to the JSX:

    <div className="App">
      <p>
        Type to filter the list:
        <input id="filter"
          name="filter"
          type="text"
        />
      </p>
      <ul>
      {fruit.map(f => <li key={f}>{f}</li>)}
      </ul>
    </div>

If we want to use the value of the text field (i.e. whatever the user is typing), then we need to link it to the component state. To get to this point, we’ll first introduce the useState() hook as follows:

import React, { useState } from 'react';
import './App.css';

function App() {

  const fruit = ['apple', 'banana', 'orange', 'grapefruit',
    'mango', 'strawberry', 'peach', 'apricot'];

  const [filter, setFilter] = useState('');

  // ...

useState() is simply a function that helps us work with component state, which is where we store any view-related data such as the filter text in our particular eample. Its purpose and functionality might be confusing at first glance, especially because the name is not particularly clear.

Basically, it takes an initial state as a parameter (an empty string in the case of the filter text), and returns an array of two items: the current state of a particular variable, and a function that can assign its value. These roughly correspond to a getter and a setter, except that the getter is the actual value rather than a function (whereas the setter is indeed a function).

We use destructuring to extract these two into separate variables. What’s interesting is that we don’t really need to implement anything more than what you see here: even the setFilter() function is given to us and we don’t need to define it.

Now that we have a way to get and set the filter text within the component’s state, we can update the input field to use this functionality:

        <input id="filter"
          name="filter"
          type="text"
          value={filter}
          onChange={event => setFilter(event.target.value)}
        />

Specifically, we use the current value of filter (from component state) to set the value attribute of the input field, and provide a React event (note the casing which distinguishes it from the onchange DOM event) that updates the component state whenever the value in the input field changes.

In this way, the filter text value in the DOM (input field) is always in sync with the component state, meaning that we can use the value in component state without ever having to touch the DOM directly. This is called a controlled component.

If you’re using the React Developer Tools extension for Chrome, you can see the state value being updated even though we haven’t implemented the list filtering functionality yet:

The component state reflects the value of the input field in the DOM.

Filtering the List

Since it is now easy to retrieve and manipulate the value of the filter text, filtering the list simply becomes a matter of using the JavaScript filter() function when rendering the list:

      <ul>
      {fruit.filter(f => f.includes(filter) || filter === '')
            .map(f => <li key={f}>{f}</li>)}
      </ul>

Each time a user types in the input field, this changes the state of the component, which causes React to re-render it. The list is updated accordingly in real-time:

Note that this filtering is case sensitive, so it won’t work as expected if you type uppercase characters. I didn’t include this level of detail to keep things as concise as possible, but it is easy to adapt this to handle case insensitive filtering.

Complete Code

If you followed the instructions so far, your src/App.js should look like this:

import React, { useState } from 'react';
import './App.css';

function App() {

  const fruit = ['apple', 'banana', 'orange', 'grapefruit',
    'mango', 'strawberry', 'peach', 'apricot'];

  const [filter, setFilter] = useState('');

  return (
    <div className="App">
      <p>
        Type to filter the list:
        <input id="filter"
          name="filter"
          type="text"
          value={filter}
          onChange={event => setFilter(event.target.value)}
        />
      </p>
      <ul>
      {fruit.filter(f => f.includes(filter) || filter === '')
            .map(f => <li key={f}>{f}</li>)}
      </ul>
    </div>
  );
}

export default App;

Summary

You should take away the following from this article:

When rendering lists of items, make sure to give each item a unique key.
The state of a component contains view-related data.
React hooks are simply functions providing ways to access state and life cycle.
useState() lets you get and set the value of a variable within state, and also provide an initial value.
A controlled component manages input fields (DOM elements) by linking their value to the React component state. This is done by binding an input field’s value to the component state, while at the same time using events to update the value in the component state when the value in the DOM changes.
State changes cause a React component to re-render.

Software development

ADO .NET Part 4: Dapper

July 22, 2020 Gigi 1 Comment

So far (Part 1, Part 2, Part 3), we’ve covered pretty much everything you need to know in order to use the ADO .NET connected layer for your day-to-day data access layer development needs. We’ve seen how to open a connection, run different types of queries, use transactions, and use prepared statements to parameterise our queries. We’ve also talked a bit about why ADO .NET is still useful today, and the general structure of ADO .NET data providers.

In practice, while ADO .NET is easy, useful and fast, it can also be quite tedious. You’ll find yourself having to write a lot of the same boilerplate over and over, and it’s quite common for different companies to have their own libraries with utility methods for some of the most repeated operations.

Some of the guys at Stack Exchange (the company behind Stack Overflow) have built one such library, called Dapper. It’s sometimes called a micro-ORM, but I prefer to distance it even more from the ORM idea and simply call it a thin layer over ADO .NET. As we shall see shortly, it simply provides extension methods over the usual ADO .NET connection objects.

You can learn more about Dapper from the following links:

In this article, we’ll see how to use Dapper to do the same things we’ve been doing so far with ADO .NET, and how it makes life easier.

I’ll be using a .NET Core console application for this one, although it should work as-is for .NET Framework. I’m using SQL Server as the database engine. The current version of Dapper at this time is 2.0.35.

A Musical Example

For this article, create a database called Music, and run the following script:

create table Song (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null,
	Year int not null,
	ArtistId int not null
);

create table Artist (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null,
);

insert into Artist (Name) values
	('Caravan Palace'),
	('Red Hot Chili Peppers'),
	('The Cure');

insert into Song (Name, Year, ArtistId) values
	('Dramophone', 2012, 1),
	('Can''t Stop', 2002, 2),
	('Just Like Heaven', 1987, 3);

We’ll use this as a playground to explore Dapper’s querying capabilities.

Opening a Connection

When using Dapper, there is no strange new way to open a connection. You simply open an ADO .NET connection exactly as we did in Part 1. So if you’re using SQL Server, follow three simple steps.

First, ensure you have a reference to System.Data.SqlClient. If you’re using .NET Core, this means adding the System.Data.SqlClient NuGet package. Projects based on the older .NET Framework need only add a reference since this assembly is part of the framework.

Secondly, import this assembly with a using statement:

using System.Data.SqlClient;

Finally, write the basic code to create and open an SqlConnection. I’m using the async version since we’ve already covered this in Part 2.

            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Music; Integrated Security=True";

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();
                Console.WriteLine("Connected!");
            }

Querying for Row Data

Once we have an ADO .NET connection object, we can use Dapper’s extension methods to execute queries. First, install the Dapper NuGet package. Then, make sure you import Dapper via a using statement:

using Dapper;

Finally, just use the QueryAsync() extension method to execute a query and retrieve the results:

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                const string sql = "select Name from Artist";

                var names = await conn.QueryAsync<string>(sql);

                foreach (var name in names)
                    Console.WriteLine(name);
            }

Here’s the output for that:

Caravan Palace
Red Hot Chili Peppers
The Cure

“A-ha!” I hear you say, “It’s simple enough with strings. What if I want to retrieve a row and return it as a C# object/DTO?” Alright, so for Artist we have this class:

    public class Artist
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public override string ToString() => $"{Id}: {Name}";
    }

Then, we make some small changes to the querying code to (a) retrieve the ID along with the name in the query, (b) have QueryAsync() use a generic argument of type Artist, and (c) rename all the variables so they make sense, given the updates.

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                const string sql = "select Id, Name from Artist";

                var artists = await conn.QueryAsync<Artist>(sql);

                foreach (var artist in artists)
                    Console.WriteLine(artist);
            }

Dapper is intelligent enough to automatically map the column names to the properties of the specified class. That’s right… the entire SqlDataReader part of ADO .NET, incuding fetching and converting individual fields, is no longer necessary. Here’s the output:

1: Caravan Palace
2: Red Hot Chili Peppers
3: The Cure

Querying for Scalar Data

For single-value queries, you use ExecuteScalarAsync() instead of QueryAsync():

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                const string sql = "select Name from Song where Id = 2";

                var songName = await conn.ExecuteScalarAsync<string>(sql);

                Console.WriteLine(songName);
            }

The output for this is:

Can't Stop

Executing Non Queries

For inserts, updates and deletes, just use ExecuteAsync():

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                const string sql = @"insert into Song(Name, Year, ArtistId)
                                     values('Snow', 2006, 2);";

                int rowsAffected = await conn.ExecuteAsync(sql);

                Console.WriteLine($"{rowsAffected} rows inserted.");
            }

The following is the output when running this program. If you query the Songs table, you’ll find the new row in there.

1 rows inserted.

Prepared Statements

When you need to run queries based on user input, you need to use prepared statements and pass the inputs as parameters. With Dapper, you provide an anonymous object where the property names match the parameters in the query. Here’s a full example:

        static async Task Main(string[] args)
        {
            Console.Write("Search for artist: ");
            string searchQuery = Console.ReadLine();
            var artists = await FindArtistAsync(searchQuery);

            foreach (var artist in artists)
                Console.WriteLine(artist);
        }

        static async Task<IEnumerable<Artist>> FindArtistAsync(string searchQuery)
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                    Initial Catalog=Music; Integrated Security=True";

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                const string sql = @"select Id, Name
                                     from Artist
                                     where Name like @searchQuery;";

                var param = new { searchQuery = $"%{searchQuery}%" };
                var artists = await conn.QueryAsync<Artist>(sql, param);
                return artists;
            }
        }

An example run of this could be as follows:

Search for artist: re
2: Red Hot Chili Peppers
3: The Cure

Another one shows that, as expected, SQL injection doesn’t work with this code (i.e. no results were returned):

Search for artist: ' OR 1=1 --

Transactions

ADO .NET transactions are already based on ADO .NET connection objects, so there is no new extension method to start, commit and rollback transactions with Dapper. However, queries involved in a transaction need to specify the transaction. For this reason, Dapper methods such as QueryAsync(), ExecuteScalarAsync() and ExecuteAsync() take a parameter of type IDbTransaction where you can pass in the relevant transacton object.

Conclusion

As you can see, Dapper takes away the tedious parts of ADO .NET and provides methods that allow you to run queries with just a connection obect. There’s no need to set up command objects, spend a lot of time preparing parameters, or map and convert returned fields one by one.

We haven’t covered everything you can do with Dapper, so feel free to check out the links at the beginning of this article to learn more.

This is also the end of this mini-series on ADO .NET. Hopefully, by now you can see that it’s actually quite easy to run queries with this good old technology, and it doesn’t have to be tedious. You can still use ORMs like Entity Framework if you like, but you should be able to weigh the pros and cons of your choice.

Software development

ADO .NET Part 3: Useful Techniques

July 19, 2020 Gigi 3 Comments

Most of what you need to know in order to query a database using ADO .NET has already been covered in Part 1 and Part 2. However, in the interest of keeping things concise, we’ve deferred covering some very important topics… until now. In Part 3, we’re going to talk about:

SQL transactions
Prepared statements
Different ADO .NET data providers and their commonalities

Travel Example

For this article, we’ll work with a new example revolving around travel destinations. We’ll ignore some best practices of database design to keep it simple.

We have a table of cities, such as Dublin, Ireland or Rome, Italy. Each of these can be assigned any number of tags such as Beach or History. The relationship between City and Tag is many-to-many, so we have a CityTag table in between.

After creating a new database called Travel, run the following script to create the tables and populate them with some data:

create table City (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null
);

create table Tag (
    Id int not null primary key identity(1, 1),
    Name nvarchar(50) not null
);

create table CityTag (
    Id int not null primary key identity(1, 1),
    CityId int not null,
    TagId int not null,
);

insert into City (Name) values
    ('Dublin, Ireland'),
    ('Rome, Italy'),
    ('Vienna, Austria')

insert into Tag (Name) values
    ('Beach'),
    ('Forest Trails'),
    ('History'),
    ('Mountains'),
    ('Great Public Transport');

insert into CityTag (CityId, TagId) values
    (1, 2),
    (1, 3),
    (2, 3),
    (3, 3),
    (3, 5);

Transactions

Imagine someone is adding a new city from a user interface:

They can enter the name of the city, select the tags, and hit the Add button to save the data they entered to the database. However, when this happens, we are actually doing two operations: adding a new row to the City table, and adding multiple rows to the CityTag table.

In ADO .NET, the code for this could look something like this:

        static async Task<bool> AddCityAsync()
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var queries = new string[]
            {
                "insert into City (Name) values ('Rimini, Italy');",
                "insert into CityTag (CityId, TagId) values (SCOPE_IDENTITY(), 1)"
            };

            try
            {
                using (var conn = new SqlConnection(connStr))
                {
                    await conn.OpenAsync();

                    using (var command = new SqlCommand())
                    {
                        command.Connection = conn;

                        foreach (var query in queries)
                        {
                            command.CommandText = query;
                            await command.ExecuteNonQueryAsync();
                        }
                    }
                }

                return true; // indicates that it was successful
            }
            catch (Exception)
            {
                return false; // indicates that something went wrong
            }
        }

What are we doing here? Well…

We’re opening a connection the same way we did before, only that we’re connecting to the Travel database.
We prepare the two separate queries representing the two operations of adding a city and adding tags. Most of the values are hardcoded, except that we’re using SCOPE_IDENTITY() in one place. This takes the last ID that was inserted, so if the new city has an ID of 4, then the CityTag row will use 4 for the CityId column.
The queries are executed in a loop. Note how we are reusing the same SqlCommand object for both queries. That’s perfectly fine, as long as we update the query every time we use it.
A try/catch will cause the method to return false if an exception occurs, indicating a problem to the calling code. On the other hand, the method returns true if everything’s ok.

If the second operation (inserting the tags) were to fail, we’d end up with a city without tags, (or a set of tags without a city, if they were executed in reverse order). That’s a pretty embarassing situation for the database to end up in, not to mention the application itself… misleading the user into thinking that what they entered was successfully saved, when in fact data was lost in the process.

Fortunately, we can use transactions to ensure that the operations either all succeed (woohoo) or all fail (display an error to the user so that they can try again). We do that by calling BeginTransaction() on the SqlConnection object, which gives us back an SqlTransaction object:

        static async Task<bool> AddCityAsync()
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var queries = new string[]
            {
                "insert into City (Name) values ('Amsterdam, Netherlands');",
                "insert into CityTag (CityId, TagId) values (SCOPE_IDENTITY(), 1)"
            };

            try
            {
                using (var conn = new SqlConnection(connStr))
                {
                    await conn.OpenAsync();
                    var tran = await conn.BeginTransactionAsync();

                    try
                    {
                        using (var command = new SqlCommand())
                        {
                            command.Connection = conn;
                            command.Transaction = tran as SqlTransaction;

                            foreach (var query in queries)
                            {
                                command.CommandText = query;
                                await command.ExecuteNonQueryAsync();
                            }
                        }

                        await tran.CommitAsync();
                        return true; // indicates that it was successful
                    }
                    catch (Exception)
                    {
                        await tran.RollbackAsync();
                        return false; // indicates that something went wrong
                    }
                    finally
                    {
                        tran.Dispose();
                    }
                }
            }
            catch (Exception)
            {
                return false; // indicates that something went wrong
            }
        }

An SqlTransaction implements IDisposable, which means it could go into a using block. However, given that we’re using a try/catch anyway to determine whether we should commit or rollback, I’ve opted to use a classic try/catch/finally instead, in order to limit the amount of nesting. (I could probably have done the same for the SqlConnection, by the way.) Note also that we need the inner try/catch because an exception could also be thrown when opening the connetion or creating a transaction, and these need to be handled by the outer try/catch.

While preparing this code, I noticed that all the Sql* classes now also have a DisposeAsync() method. I’m not sure at this stage why you would want to call this instead of just Dispose(), although presumably it could avoid a little blocking just like other methods that have sync and async versions.

Another thing to note is that we need to assign the transaction to the SqlCommand object, either via the latter’s constructor, or using its Transaction property. The Transaction property is of type DbTransaction (we’ll talk about this later), not SqlTransaction, so we had to cast the transaction object.

Multiple statements executed within a transaction will either collectively succeed or collectively fail, avoiding the issue of ending up with partial data.

Prepared Statements

Imagine we want to provide the ability to search for cities by name. A user enters part of the name of the city, and the matching results are returned. Using what we learned in Part 2, this could be implemented as follows:

        static async Task Main(string[] args)
        {
            Console.Write("Enter query: ");
            string searchQuery = Console.ReadLine();

            var results = await SearchCitiesAsync(searchQuery);

            Console.WriteLine("Results:");

            foreach (var result in results)
                Console.WriteLine(result);

            Console.WriteLine("Done. Press ENTER to exit...");
            Console.ReadLine();
        }

        static async Task<List<string>> SearchCitiesAsync(string searchQuery)
        {
            const string connStr = @"Server=.\SQLEXPRESS;
                Initial Catalog=Travel; Integrated Security=True";

            var results = new List<string>();

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                string query = $@"select Name
                                  from City
                                  where Name like '%{searchQuery}%'";

                using (var command = new SqlCommand(query, conn))
                using (var reader = await command.ExecuteReaderAsync())
                {
                    while (await reader.ReadAsync())
                    {
                        string name = reader["Name"].ToString();
                        results.Add(name);
                    }
                }
            }

            return results;
        }

We are taking user input in Main(), passing it to the SearchCitiesAsync() method, collecting the results in a list, and then returning them back. This is closer to what a real repository method might look like, than the examples we saw in Part 2. Note also how we’re directly embedding the user input into the SQL query by means of string interpolation. An example run might have the following output:

Enter query: Ital
Results:
Rome, Italy
Rimini, Italy
Done. Press ENTER to exit...

That looks correct. However, a malicious user might try the following:

Enter query: ' OR 1=1 --
Results:
Dublin, Ireland
Rome, Italy
Vienna, Austria
Rimini, Italy
Amsterdam, Netherlands
Done. Press ENTER to exit...

By entering specially crafted input, we actually got the program to return all results in the table. That is called SQL injection, and takes advantage of unsanitised input to do things like bypass login systems (see “Bypassing a Login Form using SQL Injection“) or even drop tables. That’s pretty dangerous.

To counter this threat, virtually all languages that offer database access provide a feature called prepared statements. This is a way to create placeholder variables in the SQL query, and separately provide parameters that represent the input values. In ADO .NET, this can be done as follows:

            using (var conn = new SqlConnection(connStr))
            {
                await conn.OpenAsync();

                string query = $@"select Name
                                  from City
                                  where Name like @searchQuery";

                using (var command = new SqlCommand(query, conn))
                {
                    var param = new SqlParameter();
                    param.ParameterName = "@searchQuery";
                    param.SqlDbType = System.Data.SqlDbType.NVarChar;
                    param.Value = $"%{searchQuery}%";

                    command.Parameters.Add(param);

                    using (var reader = await command.ExecuteReaderAsync())
                    {
                        while (await reader.ReadAsync())
                        {
                            string name = reader["Name"].ToString();
                            results.Add(name);
                        }
                    }
                }
            }

Instead of using string concatenation directly, we’re putting a variable named @searchQuery in the SQL query. We then create an SqlParameter with the same name, the data type of the corresponding column, and the value we want to pass in (note that the percentages in the value are necessary only because we’re using an SQL LIKE clause, and are not otherwise normally present). ADO .NET is smart enough to structure the query correctly (e.g. adding apostrophes for strings) depending on the data type. A malicious user trying the same trick against this codebase will be sorely disappointed:

Enter query: ' OR 1=1 --
Results:
Done. Press ENTER to exit...

Now, doing all that preparation for each parameter looks tedious, and it is. Fortunately, there are ways to write the same code more concisely:

Instead of setting properties, use SqlParameter‘s constructor to pass the required values.
Use AddWithValue(), e.g. command.Parameters.AddWithValue("@searchQuery", $"%{searchQuery}%");. This is great because you don’t even need to supply the data type, which is inferred automatically. While this method has drawn criticism because the data type inference isn’t perfect, I don’t think it’s right to vilify a useful option just because there are edge cases, especially if these might be triggered by bad database design.
Use Dapper. We’ll talk about this in Part 4.

In ADO .NET, using prepared statements instead of plain concatenation (which includes the more modern string interpolation, as we have seen) is essential for security. While it is nowadays considered a basic technique, you’ll be surprised how many codebases are still vunlerable to SQL injection. This is based not only on my own experience, but also on the OWASP Top Ten, where injection attacks are on top of the list year after year.

In Object-Relational Mappers (ORMs) such as Entity Framework, prepared statements are used under the hood and abstracted away, so you don’t need to worry about any of this.

Data Providers

So far, we’ve used a lot of classes that start with Sql, such as:

SqlConnection
SqlCommand
SqlDataReader
SqlTransaction
SqlParameter

This is, of course, not by coincidence. These are all ADO .NET classes relating to SQL Server. ADO .NET implementations for other database engines follow a similar convention but have a different prefix. For instance, if we were dealing with MySQL, then we would have MySqlConnection, MySqlCommand, etc. For Oracle they would instead be OracleConnection, OracleCommand, etc.

Aside from the naming convention, similar classes across different data providers share common base classes and interfaces, found in the System.Data namespace. For instance, SqlConnection and MySqlConnection both inherit from DbConnection, which in turn implements IDbConnection. If you were to write your own ADO .NET interface for some new database, you would have to subclass these base types and implement the relevant interfaces.

This abstraction serves a dual purpose. Aside from having a common interface to run database code across multiple engines, there is actually something called DbProviderFactory that you can use to create connections, commands, and other ADO .NET types based on configuration provided at runtime. This means you could be running queries against SQL Server, then change the configuration, run the application again, and run them against MySQL instead. Each abstract type also has ways of creating other types from it (e.g. DbConnection.CreateCommand()).

While DbProviderFactory offers a great deal of flexibility in terms of being able to use the same queries against different database engines, it is rarely used in practice. First, the need to swap the underlying database is something that occurs much less frequently than some people like to claim. Secondly, this flexibility comes at the cost of having to use the lowest common denominator of features across all the supported engines, and not being able to use database-specific features. Thus, the database repository layer is usually locked into a specific database vendor, but abstracted enough that it can be replaced with a different implementation if the need ever arises.

Having said that, there are cases where DbProviderFactory makes sense. For intance, if you build a product that allows customers to decide which database engine to use as a backing store (for instance, Microsoft Orleans), then you can either build a specific repository for each supported database, or use a single repository that works across all supported databases, and choose the ADO .NET provider at runtime. Both are valid options, and the best option depends on the needs of the application.

Summary and Next Steps

Part 2 taught us how to query a database using ADO .NET, but was not enough to allow us to write production-grade code. In this article, we used a travel-related example to talk about transactions and prepared statements, allowing us to do ACID-compliant updates and avoid SQL injection, respectively.

We also discussed the structure of ADO .NET itself, and how data providers for different database engines adhere to common conventions, inherit from the same base classes, and implement the same interfaces. The data provider itself could be seleted at runtime based on configuration, although it is useful to know the tradeoffs when doing so.

In the fourth and final part of this mini-series, we’ll explore Dapper, which simplifies how we work with ADO .NET and reduces much of the common boilerplate code.

Gigi Labs

Category Archives: Software development

Pitfalls of AutoMapper

Compile-Time vs Runtime

Mapping as Logic

External Dependencies

Reflection

So What Should We Use Instead?

Productivity

Conclusion

Further reading

Gathering Net Salary Data with Puppeteer

Enter Puppeteer

Getting Salary Data for Malta

Text Entry

Pulling Net Salary Data in a Range

Making a Graph

Filter List As You Type with React

Preparing the React Application

Showing a List of Fruit

State and Controlled Components

Filtering the List

Complete Code

Summary

ADO .NET Part 4: Dapper

A Musical Example

Opening a Connection

Querying for Row Data

Querying for Scalar Data

Executing Non Queries

Prepared Statements

Transactions

Conclusion

ADO .NET Part 3: Useful Techniques

Travel Example

Transactions

Prepared Statements

Data Providers

Summary and Next Steps

"You don't learn to walk by following rules. You learn by doing, and by falling over." — Richard Branson