Messing Around with Tesseract OCR in .NET

This article describes software I’m not really familiar with. Take this with a pinch of salt. For all I know, tomorrow I may realize the error of my ways and change my tune.

I recently found out that there’s this open-source OCR software called Tesseract, and decided to give it a try. I’m going to show you how you can set up something really quickly, and some initial results I’ve seen.

First, install Tesseract via NuGet:

tesseract-nuget

Second, to use Tesseract’s OCR facility, you need some language data, which Tesseract provides. Go to the tessdata project and download it. Technically, you only need the files starting with eng* if you’re going to OCR English text. If you download the whole repo, be patient – it’s a few hundred megabytes zipped. Make sure you put the files in a folder called tessdata, or it won’t work.

Third, get yourself some test images you can feed to the OCR. You can find some online, or scan something from a book.

Fourth, you’ll need to add a reference to System.Drawing, because the Tesseract package depends on the Bitmap class:

tesseract-system.drawing

Finally, we can get some code in. Let’s use this (needs using Tesseract;):

        static void Main(string[] args)
        {
            Console.Title = "Trying Tesseract";

            const string tessDataDir = @"tessdata";
            const string imageDir = @"image.png";

            using (var engine = new TesseractEngine(tessDataDir, "eng", EngineMode.Default))
            using (var image = Pix.LoadFromFile(imageDir))
            using (var page = engine.Process(image))
            {
                string text = page.GetText();
                Console.WriteLine(text);
                Console.ReadLine();
            }
        }

This is enough to set up Tesseract, load a file from disk, and OCR it (convert it from image to text). It may take a few seconds for the processing to happen. Now, you may be wondering what a Pix class is, or what is a page. And I’m afraid I can’t quite answer that, because there doesn’t seem to be any documentation available, so that doesn’t exactly help.

So, when trying this out, I first scanned a page from The Pragmatic Programmer and fed it to Tesseract. I can’t reproduce that for copyright reasons, but aside from some occasional incorrect character, the results were actually pretty good.

The next thing I did was feed it the Robertson image from this page. It looked okay at first glance, until I actually bothered to check the result:

tesseract-robertson

Good heavens. What on Earth is a “sriyialeeeurreneeseenu”? Shocked by these results, I read some tips about improving the quality of the output. Because it’s true, you can’t blame the OCR for mistaking a ‘c’ for an ‘e’ when they look very similar, and the image has some noise artifacts (see top of image, where there’s some faint print from another page).

To make sure I give it some nice, crisp text, I took a screenshot of the Emgu CV homepage (shown below), and fed it to the program.

tesseract-emgucv-source

See the results for yourself:

tesseract-emgucv

That’s quite an elaborate mess. It may be because I’m new to this software, but that doesn’t give me a very good impression. Maybe it’s my fault. But I can’t know that if there’s no documentation explaining how to use it.

Simulating Probabilistic Behaviour

The source code for this article is available at the Gigi Labs BitBucket repository.

Let’s say we have a game featuring in-game people (commonly referred to as NPCs). Any game worth its salt will have some form of artificial intelligence (AI) to bring those characters to life to some extent, even if they’re just aimlessly wandering around.

To do this in C# using plain ASCII art, we can start out with the following code, which shows a cutesy ASCII face in the middle of the console window:

            int x = 40;
            int y = 12;
            const char person = (char) 2;
            const int delay = 1000;

            Console.Title = "Probabilistic Behaviour";
            Console.CursorVisible = false;

            while (true)
            {
                Console.Clear();
                Console.SetCursorPosition(x, y);
                Console.Write(person); // show character

                // TODO movement code goes here

                Thread.Sleep(delay);
            }

This is what you should see after running the code:

probabilistic-behaviour-start

Now, to make our character wander around randomly is pretty easy. First, declare a Random instance near the top of the program:

            var random = new Random();

Then, just replace the “TODO” comment with the following:

                int direction = random.Next(0, 4); // [0, 3]

                switch(direction)
                {
                    case 1: x--; break;
                    case 2: x++; break;
                    case 3: y--; break;
                    case 4: y++; break;
                }

You should now see the ASCII guy going around randomly. That’s all well and good, but you should realise that this is a uniform distribution: each outcome is just as likely as any other.

Sometimes, that’s not what you want. For example, let’s say you want the following to happen:

  1. 20% of the time, the character will go left.
  2. 10% of the time, the character will go right.
  3. 20% of the time, the character will go up.
  4. 50% of the time, the character will go down.

We could represent the above with the following hardcoded logic:

                double direction = random.NextDouble(); // 0 <= direction < 1

                if (direction >= 0 && direction < 0.2)
                    x--;
                else if (direction >= 0.2 && direction < 0.3)
                    x++;
                else if (direction >= 0.3 && direction < 0.5)
                    y--;
                else
                    y++;

That works, and you’ll see the ASCII guy tend to move downwards more than any other direction. But how can we extend this into a generic utility that can accept various different configurations?

It helps if, rather than considering individual probabilities per action, we stack them on top of each other and consider a cumulative probability:

Action Probability Cumulative Probability
Left 0.2 < 0.2
Right 0.1 < 0.3
Up 0.2 < 0.5
Down 0.5 < 1

Stacked on top of each other by probability, the actions would look something like this:

probabilistic-behaviour-cumulative

In this case we only need to take a random sample between 0 and 1, and see where in the above stack it lands.

To facilitate this, let us first declare a ProbabilisticAction class, which represents the mapping between a probability and an action. I’m assuming we don’t need to return anything; if we do, it’s easy to turn this into a generic class.

    public class ProbabilisticAction
    {
        public double Probability { get; set; }
        public Action Action { get; set; }

        public ProbabilisticAction(double probability, Action action)
        {
            this.Probability = probability;
            this.Action = action;
        }
    }

Back in our Main(), we can declare a list of such mappings to represent the scenario we had earlier:

            var actions = new List<ProbabilisticAction>()
            {
                new ProbabilisticAction(0.2, new Action(() => x--)),
                new ProbabilisticAction(0.1, new Action(() => x++)),
                new ProbabilisticAction(0.2, new Action(() => y--)),
                new ProbabilisticAction(0.5, new Action(() => y++)),
            };

We then pass these mappings, along with our Random instance, into a new class we’ll declare next:

            var actor = new ProbabilisticActor(actions, random);

The ProbabilisticActor encapsulates the logic for determining the next action. First, we store the mappings and the Random passed in at the constructor:

    public class ProbabilisticActor
    {
        private List<ProbabilisticAction> probabilisticActions;
        private Random random;

        public ProbabilisticActor(List<ProbabilisticAction> probabilisticActions,
            Random random)
        {
            this.probabilisticActions = probabilisticActions;
            this.random = random;
        }
    }

To avoid confusion, we also want to ensure that the probabilities passed in actually add up to 1:

        public ProbabilisticActor(List<ProbabilisticAction> probabilisticActions,
            Random random)
        {
            if (probabilisticActions == null)
                throw new ArgumentNullException(nameof(probabilisticActions));
            if (probabilisticActions.Select(mapping => mapping.Probability).Sum() != 1.0)
                throw new ArgumentException("Probabilities must add up to 1!");

            this.probabilisticActions = probabilisticActions;
            this.random = random;
        }

Now, we can actually add the logic that picks the next action to execute. This is done by taking a random sample and seeing where it falls in the probability space:

        public void DoNextAction()
        {
            double sample = random.NextDouble();

            foreach(var mapping in probabilisticActions)
            {
                double probability = mapping.Probability;
                sample -= probability;

                if (sample <= 0)
                {
                    var action = mapping.Action;
                    action();
                    break;
                }
            }
        }

All we have left is to replace the logic in Main() with a call to this new method:

            while (true)
            {
                Console.Clear();
                Console.SetCursorPosition(x, y);
                Console.Write(person); // show character

                actor.DoNextAction();

                Thread.Sleep(delay);
            }

If we run the program now, we can see our ASCII guy just as southbound as before:

probabilistic-behaviour-southbound

But unlike before, we now have an abstraction of the logic we originally hardcoded, and we can reuse it for all sorts of random behaviours.

In fact, this approach is not really about game AI at all. I’ve found it really useful when writing test harnesses that needed to simulate a user randomly interacting with an application, where different actions weren’t equally as likely to occur. This is just a simple demonstration, but it is easy to build more sophisticated logic on top of this approach.

Block Selection and Column Editing

We’re all very much used to selecting text by clicking and dragging the mouse. But by pressing the Alt key while doing that, you can select a rectangular block. This feature has been around since Visual Studio 2010 – part of it even since Visual Studio 2008 – and it’s available in most modern text editors such as Notepad++. However, most people seem not to be aware of this, which is why I’m writing this article.

Let’s say you created a new Console Application in Visual Studio, and added a few variables within the Program class:

    class Program
    {
        int name;
        int age;
        int address;

        static void Main(string[] args)
        {
            
        }
    }

Oops. Main() can’t access them, because it is static, and they are not. We’re going to have to make them static as well.

Now we can add the static keyword to each variable, one by one. Or, we can place the cursor before the first int, press Alt, click and drag downwards to create a sort of blue cursor that spans multiple lines. It’s a bit hard to see, so I’ve zoomed in a bit here:

column-editing-1

With that, we’ve enabled column editing. This means that whatever you type will now be written in multiple lines:

column-editing-2

You can use this to comment lines in bulk (similar to the Ctrl+K+C or Ctrl+E+C shortcuts, depending on your editor settings):

column-editing-3

Now, column editing is actually a special case of block selection with a width of zero. To see how block editing works, let’s change our variable names to the following:

        static int personName;
        static int personAge;
        static int personAddress;

Now, due to changing requirements, we decided that these shouldn’t be called person*, but customer*. Given that the variable names are nicely aligned underneath each other, we can press Alt, click and drag around person on all three lines, and we’ve made a block selection:

block-selection-1

Press Backspace to remove person from all three lines. The block collapses to zero width, so we’re back to column editing, and we can now easily write customer on all three lines:

block-selection-2

So there you go. Block selection and column editing are nothing new, but they’re very handy and good to know about.

A Gentle Introduction to Gulp

We’re at the end of 2015, and web technology has changed quite  a bit since I started in 2002. Nowadays, for the front end stuff, there is a whole family of tools based on the node.js package manager (npm) that you can use to streamline and automate your workflow.

In this article (based on Windows), we’ll learn to use Gulp to do routine tasks such as concatenating and minifying JavaScript tasks. There’s another tool called Grunt with a similar purpose, and you’ll find all sorts of discussions on the internet comparing Grunt vs Gulp. Basically, Grunt is the older of the two and has a bigger community – an important factor considering that these tools are plugin-driven. However, I’m covering Gulp here as I felt it was more intuitive. For this small demonstration it has all the plugins we need, and performance (a common point of comparison) isn’t even a factor.

Setting up Gulp

The first thing we need is to install node.js:

install-nodejs

There’s a chance you might already have node.js, if you installed it with Visual Studio 2015.

Once you have installed node.js, you should have npm in your path. Open a command prompt, and install Gulp using the following command:

npm install gulp -g

-g means globally, and thanks to this, gulp should now be in your path.

Next, we want to create a package.json file. This is a kind of project file for node.js-related stuff. We can use npm for this too:

npm init

npm will ask a bunch of questions in order to set up the package.json file, suggesting possible answers where it makes sense to do so. name and version are required, but you can leave the rest empty if you like:

npm-init

Next, we need to install Gulp locally in our project:

npm install gulp --save-dev

This installs Gulp; –save-dev updates the package.json with a devDependencies field:

{
  "name": "gulptest",
  "version": "1.0.0",
  "description": "Learning to use Gulp.",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Daniel D'Agostino",
  "license": "ISC",
  "devDependencies": {
    "gulp": "^3.9.0"
  }
}

Plugins and the Gulp file

Gulp itself doesn’t do anything; it is just configured to run tasks. Its capabilities come from the plugins you install, and you configure it to do stuff using a Gulp file. For this simple example, we’re just going to use a few plugins:

npm install gulp-concat gulp-uglify --save-dev

Once again, –save-dev updates your devDependencies in package.json:

  "devDependencies": {
    "gulp": "^3.9.0",
    "gulp-concat": "^2.6.0",
    "gulp-uglify": "^1.5.1"
  }

Next, create a file called gulpfile.js, and put the following code in it:

var gulp = require('gulp'),
    uglify = require('gulp-uglify'),
    concat = require('gulp-concat');
    
gulp.task('default', function() {
  return gulp.src('js/*.js')
    .pipe(concat('all.js'))
    .pipe(gulp.dest('dist/'));
});

To test this out, I downloaded jquery and jquery-ui, and put the uncompressed Javascript files in a “js” folder. Having created the Gulpfile above, all you need is to run Gulp:

gulp

You should find a folder called dist, with a file called all.js in it, containing the contents of the files originally in the js folder:

gulp-concat

Concatenating JavaScript is good for performance because the browser only needs to make a single request, rather than having to retrieve several small files. But we can do even better by minifying the JavaScript (using the gulp-uglify plugin). Just add the following line:

var gulp = require('gulp'),
    uglify = require('gulp-uglify'),
    concat = require('gulp-concat');
    
gulp.task('default', function() {
  return gulp.src('js/*.js')
    .pipe(concat('all.js'))
    .pipe(uglify())
    .pipe(gulp.dest('dist/'));
});

Run Gulp again, and you’ll find that all.js has been updated. In fact, it’s much smaller now, and it’s completely illegible:

gulp-uglify

Conclusion and Further Reading

The purpose of this article was to get you set up with Gulp, and see something working with the least possible hassle. Mark Goodyear’s article (on which this article is partly based) covers a lot of other common operations to carry out with Gulp. If you need to do anything particular – linting your JavaScript files, minifying your CSS, using Less, etc, there’s probably a plugin for it.

Beyond that, all you need to know is how to use Gulp effectively as part of your build process.

  • Running Gulp without arguments makes it look for the “default” task. You can pass the name of a task to run as an argument, allowing you to run a variety of operations.
  • How do you debug your minified JavaScript? You don’t. Use separate tasks for development and for release, and minify only in your release task.
  • Ideally these tasks should be run automatically as part of your continuous integration.
  • An ASP .NET 5 (formerly known as vNext) project in Visual Studio 2015 can easily integrate with npm tools, and you can configure it to run your tasks when you build.
  • Not using Windows? These command line tools are easy to use on other platforms (although installing npm will obviously be different).

Update 8th January 2016: Check out “More Gulp in Practice“, the followup to this article.

Elements of Football Management in Software

My recent return to playing Sensible World of Soccer is not just fun. After all, team management has been happening in the football scene for far longer than the software industry has even existed. There is some serious stuff we can learn there.

So after three successful seasons that turned Hibernians (a local team that most people have never even heard of) into a winner of all the major football leagues, I decided to take up managing Partizani Tirana, the team that won the Albanian Premier Division, but that is similarly crap on an international level (at least in the game).

I could have remained managing Hibernians, now a stellar team, for the fourth season. But while it is enjoyable to win, the real challenge (and fun) is in watching people grow; learning their strengths and weaknesses, and putting them in the right formation so that they can work together with synergy. As for the old team, it is my aspiration in all aspects of life to leave things better than I found them, and my old boss certainly didn’t mind decorating his shelves with the trophies I won:

english_001

This funny guy is my new boss in the game:

english_002

And this is my new team:

english_003

I don’t know these guys, and I don’t know how they play. So how can I effectively manage them?

The first thing you need to do to manage a team is to learn their strengths and weaknesses, i.e. what they’re good at, and what they need to improve. So after playing a few matches with this team, I can learn about their skills: who are the fast guys, who have the best ball control, etc.

Everyone can be useful. If one of the attackers is not great at finishing, for instance, he can have a supporting role for the other attacker. The trick is in finding the right role for each team member so that they can be useful to the team.

While you can often make adjustments within the team to address weaknesses, sometimes this is not enough. For instance, with Hibernians, buying a fast midfielder with great ball control gave a great boost to the team. He would support attackers in offensive strategy, run back to help defenders by intercepting opponents, and basically go everywhere to support the functioning of the team.

This midfielder is an example of a playmaker. The playmaker is essentially a visionary, a strategist, and a catalyst for the team to achieve its goals. The playmaker is not necessarily in a leading role. But he is respected because he makes things happen; he brings the team towards success, and also helps it get through the tougher situations.

Software teams are not very different. Developers come from all sorts of backgrounds, and have different skills and comfort zones. The manager who takes the time to learn about their abilities will be in a strong position to allocate his resources where they are best focused.

Playmakers in software are sometimes called catalysts (as in the wonderful anecdote in the Peopleware book). Whether they have a leadership role or not, they play an important part in helping teams to gel, solving complex problems, working on essential infrastructure, and formulating the technological vision of the team.

Managing software teams may be something that appeared over the past few decades, but in many ways it’s not very different from team management in general. Looking at older disciplines such as football allows us to gain insight into management as a whole, and they can very well serve as analogies for what we do in software.