All posts by Gigi

The Sorry State of Tourism in Ireland

I first visited Ireland around this time eight years ago, for St. Patrick’s Day 2012. It did not take me long to fall in love with the place. Since then, I have revisited Ireland other times, lived there for about a year and a half, and been around most of the country. As a result, my Irish experience has been a mixture of thrills and disappointments.

Separate hot and cold water taps (when hot water is actually available) is a disease more prevalent in Ireland than the Coronavirus.

When I recently revisited Ireland around the same time that the Coronavirus outbreak started, I once again had mixed feelings. Many things were really nice, but I wasn’t spared any disappointments.

As part of the Sorry State of the Web series, in which I promote good web development practices by illustrating bad ones, I will focus on websites (and other technology services) I came across during my research for this trip. Other things that annoyed me, such as cafes charging you an extra 2 Euros just to toast your sandwich, will be out of scope.

Aran Islands

The Aran Islands may be beautiful, but their website could have been better.

In fact, they did make it better by fixing this problem with ampersand HTML entities showing within the page.

Insecure WiFi at Penneys

Penneys, the chain of department stores that you might otherwise recognise as Primark in the UK, offers free WiFi to their customers.

Unfortunately, given that you need to join the WiFi via an endpoint that does not come with a proper SSL certificate, it is not only useless, but plain risky for customers to use.

Secret Valley Wildlife Park

The Secret Valley Wildlife Park website has a number of issues.

For starters, some of the links at the bottom (i.e. Terms & Conditions, Privacy, and Cookies) don’t work. The cursor doesn’t even turn into a pointer, and if you look at the HTML, it seems they put anchor tags without href attributes.

On the Animals page, images take ages to load because they used huge images in the page without using thumbnails (see also: The Shameful Web of April 2017 (Part 1)). If you’re including large images in a page, always use small versions and link to the larger version.

There also seems to be a problem with HTTPS… we’ll get to that too.

Going on the online booking system (which is what we care about when it comes to HTTPS, since sensitive information is involved), we see that HTTPS looks okay so far. They also used to have a test ticket type that I’m happy to see has been removed. In fact, they recently updated this page with a plea for funds since Coronavirus is messing up their business (understandably).

Unfortunately, when you proceed to the next step and are about to book a ticket, the connection suddenly isn’t secure any more. It’s a small mixed content problem because of an image, but the problem is that it undermines the trust that people have in such websites (when it comes to keeping their sensitive financial data secure), and can potentially have security-related consequences.

So while I sympathise with Secret Valley (and so many others affected by the Coronavirus), it’s also important to keep your data safe. By all means, send them money, but do it using alternative, secure means.

The M50 Toll

If you’re going to be renting a car in Dublin and using it to drive around the country, one of the things you’re going to have to do is pay the toll on the M50 motorway. The M50 uses a barrier-free toll system that can be paid online by 8pm on the next day.

While the close deadline is a little annoying, being able to pay it online is quite convenient… when it works.

In this case, the system just didn’t want to work, although I tried several times. This can happen, but what is a little worrying here is that I don’t think those details about the error (the XML-like thing) should be disclosed to the customer.

Blackrock Castle Observatory

If you like science, then Blackrock Castle Observatory is a great place to visit. They have a lot of interactive exhibits that explain concepts from astronomy and science in general:

Wait… what’s that at the bottom-right, where the arrow is pointing? Let’s take a closer look:

Uh oh… someone didn’t activate Windows! That’s quite embarrassing, and can be seen on several of their exhibits.

Wrap Up

Although Ireland will always have a special place in my heart, it hasn’t spared me any disappointments, both in terms of the service I received in various places as a tourist, but also on websites and other technology-related services.

This article, like others in the same series, is an educational exercise aimed at improving technology standards, especially on the web which so many people come in contact with. The aim is to learn from this and provide a better service, so I hope that nobody is offended, particularly in this difficult time.

Instead, I hope that in such times, when we depend on technology so much more, we can overcome these obvious problems and use technology safely and reliably to reduce the burden of living in a difficult situation as much as possible.

With the Coronavirus currently devastating health, economy, tourism and peace of mind across the world, we need to be safe, help each other, and show empathy because so many people are affected in different ways.

The Sorry State of Buying a Mobile Phone in Malta

A few years ago, I ran the Sorry State of the Web series of articles to promote good web design/development practices by pinpointing shameful ones that should be avoided (an approach inspired by Web Pages That Suck).

Websites today are very different from when Vincent Flanders started Web Pages That Suck. Things like Mystery Meat Navigation are almost gone entirely, as modern websites embrace more minimal designs and are often built on foundations such as Bootstrap or Material Design.

However, after a series of very frustrating experiences today while trying to buy a mobile phone, I am convinced that the state of professionally-built websites has not really improved. Websites may have converged to similar designs that overall are less painful, but the user experience is still miserable because of a lack of professionalism.

As a result, although I would have preferred not to continue this series, I feel there is still value in doing so. In this article, we will focus on websites of companies that sell mobile phones in Malta, where the technology and customer service are both still very medieval.

Sound Machine

Let’s start with Sound Machine. When you first visit this site, you get one of those cookie notices at the bottom-left. That’s pretty normal, especially in the GDPR era.

However, part of this notice sticks around even after you close it. It’s particularly noticeable if you scroll down so that the background is uniformly dark:

This is pretty strange, and probably unintended. But wait… do you notice something in that dark footer area? That’s right — this website was made by none other than Cyberspace Solutions, to which I had dedicated an entire article 3 years ago. I guess this explains a lot.

Another little mistake can be found in their Cookie Policy, where someone has been a little careless with their HTML tags:

But the worst blunder of all is that the Contact form does not even work:

In fact, when you press the Send button, a spinner runs next to it and never stops. There is no indication of the failure, unless you open the Developer Console, which most people obviously will not know how to do.

The result of this is a poor user experience, because (a) the form does not work, (b) there is no indication that anything failed, and, to make matters worse, (c) there is no email address given as an alternative. A customer therefore has no option other than to give them a call or show up in person, which many prefer to avoid for various reasons.

The takeaway from this is that when you build a website, you should always double-check to make sure things look right and that things actually work. Customers aren’t very happy when they don’t.

Direct Vision

Direct Vision has a nice e-commerce website where you can look for products and eventually buy them online. Let’s say I’m interested in the Samsung Galaxy A40… I get a lot of options:

Let’s take a look at the black phone on the left:

Great! It seems to be in stock!

Except that… it isn’t! It turns out that this phone is not available at all in one of their shops, and in the other, it’s only available in a couple of colours (Coral and White). The black one, as it turns out, is not in stock. They need to order it.

So why do they say that it is in stock when it isn’t? The salesgirl tried to give a dumb explanation, and also suggested I go with one of the other colours and get a cover to hide the undesired colour. Naturally, I didn’t buy that (pun intended). It’s truly shameful to waste people’s time in this way.

Tablets and More

Tablets and More is another consumer electronics store. Browsing around, it’s easy to notice a few things out of place. For instance, the thing at the bottom left that fails to load:

…and which, after a few seconds, becomes something else but still fails to explain what it’s supposed to be:

Even the product descriptions seem to be a real mess…

…in what appears to be a copy & paste job from GSM Arena:

What shall we say, then, about the creepy practices of harvesting people’s email addresses via the live chat feature (something that is becoming increasingly common in live chat products nowadays) or of not displaying prices and expecting people to get in touch to find out how much an item costs?

It’s almost as if this store is intentionally doing everything it can to keep customers away.

Phone Box

The minute you land at Phone Box, you can immediately tell that something is wrong:

If a site isn’t being served over HTTPS, then it’s possible for requests to be intercepted by a man in the middle and arbitrary responses served as a result, as Troy Hunt demonstrates in his article about HSTS. This is particularly risky for websites that require you to submit information, and Phone Box does indeed fall in this category:

As I’ve written ad nauseam throughout the Sorry State of the Web series, it is not okay to accept login credentials insecurely over HTTP. While other information being sent insecurely may or may not fall under GDPR and Data Protection laws, I think we would be a lot more comfortable if such details (such as one’s personal address) are not leaked to the world.

At least, this site does not take credit card details, since the only payment method available is cash upon delivery. Let’s hope they don’t decide to accept credit cards as a new feature.

Conclusion

Even from a small sample of websites, we have seen a range of issues going from simple negligent oversights to serious security problems and broken features. In 2020, businesses are still paying a lot of money for web design agencies to do a half-assed job. They probably do not realise how much business they are losing as a result.

How can we make things better? I have a few ideas.

  • Web design agencies: test your website’s functionality and content thoroughly. Get up to speed with the latest security and data protection requirements, as there may be legal repercussions if you don’t.
  • Businesses: choose very carefully who to work with when building a website. Take a look at their past work, and get a second opinion if you don’t feel you can evaluate it. Make it easy for customers to reach you and give them a good service. Otherwise, don’t complain that you are losing business to online marketplaces such as Amazon.
  • Customers: do not buy from businesses that have insecure websites, shady practices, or salespeople who think you’re stupid. Things will only change when they notice that their behaviour is detrimental to their own survival.

Managing ASP .NET Core Settings in Multiple Environments

One of the many benefits offered by .NET Core over its predecessor (the older .NET Framework) is the way it handles configuration. I wrote about the capabilities of .NET Core configuration back when the new framework was still prerelease and using a different name (see ASP .NET 5 Application Configuration, Part 1 and Part 2), and although some APIs may have changed, most of it should still be relevant today.

It is easy to set up application settings with this recent configuration model. However, when it comes to actually deploying an application into different environments (e.g. development, staging, production, and possibly others), things become complicated. How do we maintain configurations for all these environments, and how do we save ourselves from the tedious and error-prone practice of manually tweaking individual settings on all these different servers? How do we make sure we don’t lose these settings outright if a server experiences a technical failure? These challenges have nothing to do with the configuration model itself, as they are a more general administrative burden.

One option is to use something like Octopus Deploy to store settings for different environments and transform a settings file (such as appsettings.json) at deployment time. However, not everybody has this luxury. In this article, we will see how we can manage configurations for multiple environments using features that .NET Core offers out of the box.

At the time of writing this article, .NET Core 3.1.1 is the latest version.

Stacking Configurations in .NET Core

The .NET Core configuration libraries allow you to combine application settings from different sources, even if these are of different types (e.g. JSON, XML, environment variables, etc). Imagine I have these two JSON files, named appsettings1.json and appsettings2.json:

{
    "ApplicationName": "Some fancy app",
    "Timeout": 5000
}
{
    "Timeout": 3000,
    "ConnectionString": "some connection string"
}

In order to read these files, I’ll need to install the following package:

dotnet add package Microsoft.Extensions.Configuration.Json

We can then use the ConfigurationBuilder to read in both files, combine them, and give us back an IConfigurationRoot object that allows the application code to query the settings that were read:

using System;
using Microsoft.Extensions.Configuration;

namespace netcoreconf1
{
    class Program
    {
        static void Main(string[] args)
        {
            var config = new ConfigurationBuilder()
                .AddJsonFile("appsettings1.json")
                .AddJsonFile("appsettings2.json")
                .Build();

            Console.WriteLine("Application Name: " + config["ApplicationName"]);
            Console.WriteLine("Timeout:          " + config["Timeout"]);
            Console.WriteLine("Connection String " + config["ConnectionString"]);

            Console.ReadLine();
        }
    }
}

After ensuring that the two JSON files are set to copy to the output directory on build, we can run the simple application to see the result:

The Application Name setting comes from the first JSON file, while the Connection String setting comes from the second. The Timeout setting, on the other hand, exists in both files, but the value was obtained from the second JSON file. In fact, the order in which configuration sources are read is important, and by design, settings read from later sources will overwrite the same settings read from earlier sources.

It follows from this that if we have some variable that defines which environment (e.g. Production) we’re in, then we can do something like this:

            const string environment = "Production";

            var config = new ConfigurationBuilder()
                .AddJsonFile("appsettings.json")
                .AddJsonFile($"appsettings.{environment}.json", optional: true)
                .Build();

In this case we have a core JSON file with the settings that tend to be common across environments, and then we have one or more JSON files specific to the environment that we’re running in, such as appsettings.Development.json or appsettings.Production.json. The settings in the environment-specific JSON file will overwrite those in the core appsettings.json file.

You will notice that we have that optional: true parameter for the environment-specific JSON file. This means that if that file is not found, the ConfigurationBuilder will simply ignore it instead of throwing an exception. This is the default behaviour in ASP .NET Core, which we will explore in the next section. It is debatable whether this is a good idea, because it may be perfectly reasonable to prefer the application to crash rather than run with incorrect configuration settings.

Multiple Environments in ASP .NET Core Using Visual Studio

By default, ASP .NET Core web applications use this same mechanism to combine a core appsettings.json file with an environment-specific appsettings.Environment.json file.

In the previous section we used a constant to supply the name of the current environment. Instead, ASP .NET Core uses an environment variable named ASPNETCORE_ENVIRONMENT to determine the environment.

Let’s create an ASP .NET Core Web API using Visual Studio and run it to see this in action:

Somehow, ASP .NET Core figured out that we’re using the Development environment without us setting anything up. How does it know?

You’ll find the answer in the launchSettings.json file (under Properties in Solution Explorer), which defines the aforementioned environment variable when the application is run either directly or using IIS Express. You’ll also find that there are already separate appsettings.json and appsettings.Development.json files where you can put your settings.

If you remove this environment variable and re-run the application, you’ll find that the default environment is Production.

On the other hand, if we add a different appsettings.Staging.json, and update the environment variable to Staging, then we can run locally while pointing to the Staging environment:

Naturally, connecting locally to different environments isn’t something you should take lightly. Make sure you know what you’re doing, as you can do some real damage on production environments. On the other hand, there are times when this may be necessary, so it is a simple and powerful technique. Just be careful.

“With great power comes great responsibility.”

— Uncle Ben, Spider-Man (2002)

Multiple Environments in Console Apps

While ASP .NET Core handles the configuration plumbing for us, we do not have this luxury in other types of applications. Console apps, for instance those built to run as Windows Services using Topshelf, will need to have this behaviour as part of their code.

In a new console application, we will first need to add the relevant NuGet package:

dotnet add package Microsoft.Extensions.Configuration.Json

Then we can set up a ConfigurationBuilder to read JSON configuration files using the same stacked approach described earlier:

            var config = new ConfigurationBuilder()
                .AddJsonFile("appsettings.json")
                .AddJsonFile($"appsettings.{environment}.json")
                .Build();

We can read the environment from the same ASPNETCORE_ENVIRONMENT environment variable that ASP .NET Core looks for. This way, if we have several applications on a server, they can all determine the environment from the same machine-wide setting.

            string environment = Environment.GetEnvironmentVariable("ASPNETCORE_ENVIRONMENT");

            if (environment != null)
            {
                var config = new ConfigurationBuilder()
                    .AddJsonFile("appsettings.json")
                    .AddJsonFile($"appsettings.{environment}.json")
                    .Build();

                // TODO application logic goes here
            }
            else
            {
                Console.WriteLine("Fatal error: environment not found!");
                Environment.Exit(-1);
            }

If we run the application now, we will get that fatal error. That’s because we haven’t actually set up the environment variable yet. See .NET Core Tools Telemetry for instructions on how to permanently set an environment variable on Windows or Linux. Avoid doing this via a terminal or command line window since that setting would only apply to that particular window. I’m doing this in the screenshot below only as a quick demonstration, since I don’t need to maintain this application.

Deploying an ASP .NET Core Web Application to a Windows Server

When developing applications locally, we have a lot of tools that make our lives easy thanks to whichever IDE we use (e.g. Visual Studio or Visual Studio Code). Deploying to a server is different, because we need to set everything up ourselves.

The first thing to do is install .NET Core on the machine. Download the ASP .NET Core Hosting Bundle as shown in the screenshot below. This includes the Runtime (which allows you to run an .exe built with .NET Core) and the ASP .NET Core Module v2 for IIS (which enables you to host ASP .NET Core web applications in IIS). However, it does not include the SDK, so you will not be able to use any of the command-line dotnet tools, and even dotnet --version will not let you know whether it is set up correctly.

Next, we can set up a couple of system environment variables:

The first is ASPNETCORE_ENVIRONMENT which has already been explained ad nauseam earlier in this article. The second is DOTNET_CLI_TELEMETRY_OPTOUT (see .NET Core Tools Telemetry), which can optionally be used to avoid sending usage data to Microsoft since this behaviour is turned on by default.

Another optional preparation step that applies to web applications is to add health checks. This simply means exposing an unprotected endpoint which returns something like “OK”. It is useful to check whether you can reach the web application at a basic level (while eliminating complexities such as authentication), and it can also be used by load balancers to monitor the health of applications. This can be implemented either directly in code, or using ASP .NET Core’s own health checks feature.

Finally, you really should set up logging to file, and log the environment as soon as the application starts. Since ASP .NET Core does not have a file logger out of the box, you can use third party libraries such as NLog or Serilog. Like this, if the application picks up the wrong environment, you can realise very quickly. The log files will also help you monitor the health of your application and troubleshoot issues. Use tools such as baretail to monitor logs locally on the server, or ship them to a central store where you can analyse them in more detail.

With everything prepared, we can publish our web application:

dotnet publish -c Release -r win10-x64

All that is left is to copy the files over to the server (compressing and decompressing them in the process) and run the application.

The above screenshot shows the deployed ASP .NET Core web application running, serving requests, and picking up the correct configuration. All this works despite not having the .NET Core SDK installed, because it is not required simply to run applications.

Deploying an ASP .NET Core Web Application to IIS

In order to host an ASP .NET Core web application in IIS, the instructions in the previous section apply, but there are a few more things to do.

First, if the server does not already have IIS, then it needs to be installed. This can be done by going to:

  1. Server Manager
  2. Add roles and features
  3. Next
  4. Next
  5. Next
  6. Select Web Server (IIS) as shown in the screenshot below.
  7. Click Add Features in the modal that comes up.
  8. Next
  9. Next
  10. Next
  11. Next
  12. Install

In IIS, make sure you have the AspNetCoreModuleV2 module, by clicking on the machine node in the Connections panel (left) and then double-clicking Modules. If you installed IIS after having installed the ASP .NET Core Hosting Bundle, you will need to run the latter installer again (just hit Repair).

Next, go into IIS and set up a website, with the path pointing to the directory where you put the web application’s published files:

Start your website, and then visit the test endpoint. Since you don’t have a console window when running under IIS, the log files come in really handy. We can use them to check that we are loading configuration for the right environment just as before:

It’s working great, and it seems like from .NET Core 3+, it even logs the hosting environment automatically so you don’t need to do that yourself.

Troubleshooting

When running under IIS, an ASP .NET Core application needs a web.config file just like any other. While I’ve had to add this manually in the past, it seems like they are now being created automatically when you publish. If, for any reason, you’re missing a web.config file, you can grab the example in the docs.

I ran into a problem with an IIS-hosted application under .NET Core 2.2 where the environment variable defining the hosting environment wasn’t being picked up correctly by ASP .NET Core. As a workaround, it is actually possible to set environment variables directly in web.config, and they will be passed by IIS to the hosted application.

On the other hand, when running .NET Core applications under Linux, keep in mind that files are case sensitive. Andrew Lock has written about a problem he ran into because of this.

Summary

In this article, we have seen that the old way of transforming config files is no longer necessary. By stacking configuration files, we can have a core appsettings.json file whose settings is overwritten by other environment-specific JSON files.

This setup is done automatically in ASP .NET Core applications, using the ASPNETCORE_ENVIRONMENT environment variable to determine the current environment. In other types of apps, we can read the same environment variable manually to achieve the same effect. Under Visual Studio, this environment variable can easily be changed in launchsettings.json to work under different environments, as long as the necessary level of care is taken.

Deployment of ASP .NET Core applications requires the .NET Core Runtime to be installed on the target server. The ASP .NET Core Hosting Bundle includes this as well as support for hosting ASP .NET Core applications under IIS. The SDK is not required unless the dotnet command-line tools need to be used on the server.

Before deploying, the server should also have the right environment variables, and the application should be fitted with mechanisms to easily check that it is working properly (such as an open endpoint and log files).

Windows Server Sessions Consume Resources

If you work in a Windows environment, you have most likely had to log onto a Windows Server machine. Windows Server is used to host web applications (via IIS), manage a corporate network, and so much more.

Every time you log onto Windows Server, your profile is actually using a portion of the resources (e.g. CPU, RAM, disk and network utilisation) on that machine. It should not sound like a surprise that while some resources are used just to keep you logged on, the more processes you are running, the more resources you are using. So keeping things like Firefox or SQL Server Management Studio open can consume a significant portion of the server’s memory.

While it is understandable to log onto a server and utilise system resources for maintenance, troubleshooting or deployment purposes, many people do not realise that these resources are not released once they disconnect from the server. In fact, when you try to close a Remote Desktop Connection from the blue bar at the top, you get a warning that tells you this:

We can confirm this by opening the Users tab in the Task Manager, and see that logged in users who have disconnected are still using up a lot of memory (and other resources):

It is interesting to note that Sherry and Smith each have just Firefox open, with 3 or 4 tabs. You can imagine what the impact would be if there were more users each with more applications running.

In order to free up resources when you’re done working on the server, simply Sign Out instead of just disconnecting:

Once users have signed out, you’ll see that those disconnected sessions have disappeared from the Users tab in Task Manager, and their respective resources have been freed:

So there you go: this little tip can help you make the most of your server simply by not being wasteful of system resources.

Scripting Backups with bash on Linux

Over the years, Linux has gone from being something exclusively for the hardcore tech-savvy to something accessible to all. Modern GUIs provide a friendly means for anybody to interact with the operating system, without needing to use the terminal.

However, the terminal remains one of the areas where Linux shines, and is a great way for power users to automate routine tasks. Taking backups is one such mundane and repetitive activity which can very easily be scripted using shell scripts. In this article, we’ll see how to do this using the bash shell, which is found in the more popular distributions such as Ubuntu.

Simple Folder Backup with Timestamp

Picture this: we have a couple of documents in our Documents folder, and we’d like to back up the entire folder and append a timestamp in the name. The output filename would look something like:

Documents-2020.01.21-17.25.tar.gz

The .tar.gz format is a common way to compress a collection of files on Linux. gzip (the .gz part) is a powerful compression algorithm, but it can only compress a single file, and not an entire folder. Therefore, a folder is typically packaged into a single .tar file (historically standing for “tape archive”). This tarball, as it’s sometimes called, is then compressed using gzip to produce the resulting .tar.gz file.

The timestamp format above is something arbitrary, but I’ve been using this myself for many years and it has a number of advantages:

  1. It loosely follows the ISO 8601 format, starting with the year, and thus avoiding confusion between British (DD/MM) and American (MM/DD) date notation.
  2. For the same reason, it allows easy time-based sorting of backup files within a directory.
  3. The use of dots and dashes comes in handy when the filename is too long and is wrapped onto multiple lines. Related parts (e.g. the year, month and day of the date) stick together, but separate parts (e.g. the date and time) can be broken onto separate lines. Also, this takes up less (screen) space than other methods (e.g. using underscores).

To generate such a timestamp, we can use the Linux date command:

$ date
Tue 21 Jan 2020 05:35:34 PM UTC

That gives us the date, but it’s not in the format we want. For that, we need to give it a parameter with the format string:

$ date +"%Y.%m.%d-%H.%M.%S"
2020.01.21-17.37.14

Much better! We’ll use this shortly.

Let’s create a backups folder in the home directory as a destination folder for our backups:

$ mkdir ~/backups

We can then use the tar command to bundle our Documents folder into a single file (-c is to create, and -f is for file archive):

$ tar -cf ~/backups/Documents.tar ~/Documents

This works well as long as you run it from the home directory. But if you run it from somewhere else, you will see an unexpected structure within the resulting .tar file:

Instead of putting the Documents folder at the top level of the .tar file, it seems to have replicated the folder structure leading up to it from the root folder. We can get around this problem by using the -C switch and specifying that it should run from the home folder:

$ tar -C ~ -cf ~/backups/Documents.tar Documents

This solves the problem, as you can see:

Now that we’ve packaged the Documents folder into a .tar file, let’s compress it. We could do this using the gzip command, but as it turns out, the tar command itself has a -z switch that produces a gzipped tarball. After adding that and a -v (verbose) switch, and changing the extension, we end up with:

$ tar -C ~ -zcvf ~/backups/Documents.tar.gz Documents
Documents/
Documents/some-document.odt
Documents/tasks.txt

All we have left is to add a timestamp to the filename, so that we can distinguish between different backup files and easily identify when those backups were taken.

We have already seen how we can produce a timestamp separately using the date command. Fortunately, bash supports something called command substitution, which allows you to embed the output of a command (in this case date) in another command (in this case tar). It’s done by enclosing the first command between $( and ):

$ tar -C ~ -zcvf ~/backups/Documents-$(date +"%Y.%m.%d-%H.%M.%S").tar.gz Documents

As you can see, this produces the intended effect:

Adding the Hostname

If you have multiple computers, you might want to back up the same folder (e.g. Desktop) on all of them, but still be able to distinguish between them. For this we can again use command substitution to insert the hostname in the output filename. This can be done either with the hostname command or using uname -n:

$ tar -C ~ -zcvf ~/backups/Desktop-$(uname -n)-$(date +"%Y.%m.%d-%H.%M.%S").tar.gz Desktop

Creating Shell Scripts

Although the backup commands we have seen are one-liners, chances are that you don’t want to type them in again every time you want to take a backup. For this, you can create a file with a .sh extension (e.g. backup-desktop.sh) and put the command in there.

However, if you try to run it (always preceded by a ./), it doesn’t actually work:

The directory listing in the above screenshot shows why this is happening. The script is missing execution permissions, which we can add using the chmod command:

$ chmod 755 backup-desktop.sh

In short, that 755 value is a representation of permissions on a file which grants execution rights to everybody (see Chmod Calculator or run man chmod for more information on how this works).

A directory listing now shows that the file has execution (x) permissions, and the terminal actually highlights the file in bold green to show that it is executable:

Running Multiple Scripts

As we start to back up more folders, we gradually end up with more of these tar commands. While we could put them one after another in the same file, this would mean that we always backup everything, and we thus lose the flexibility to back up individual folders independently (handy when the folders are large and backups take a while).

For this, it’s useful to keep separate scripts for each folder, but then have one script to run them all (see what I did there?).

There are many ways to run multiple commands in a bash script, including simply putting them on separate lines one after another, or using one of several operators designed to allow multiple commands to be run on the same line.

While the choice of which to use is mostly arbitrary, there are subtle differences in how they work. I tend to favour the && operator because it will not carry out further commands if one has failed, reducing the risk that a failure goes unnoticed:

$ ./backup-desktop.sh && ./backup-documents.sh

Putting this in a file called backup-all.sh and executing it has the effect of backing up both the Desktop and the Documents folders. If I only want to backup one of them, I can still do that using the corresponding shell script.

Conclusion

We have thus far seen how to:

  • Package a folder in a compressed file
  • Include a timestamp in the filename
  • Include the hostname in the filename
  • Create bash scripts for these commands
  • Combine multiple shell scripts into a single one

This helps greatly in automating the execution of backups, but you can always take it further. Here are some ideas:

  • Use cron to take periodic backups (e.g. daily or weekly)
  • Transfer the backup files off your computer – this really depends what you are comfortable with, as it could be an external hard drive, or another Linux machine (e.g. using scp), or a cloud service like AWS S3.

Removing the Server Header in ASP .NET Core

There are many aspects to web security, but in this article we’ll focus on one in particular. Attackers can use any available information about a target web application to their advantage. Therefore, if your web application is sending out headers revealing the underlying infrastructure of your web application, attackers can use those details to narrow down their attack and attempt to exploit vulnerabilities in that particular software.

Let’s create a new ASP .NET Core web application to see what is returned in the headers by default:

mkdir dotnet-server-header
dotnet new web
dotnet run

This creates a “Hello world” ASP .NET Core application using the “ASP .NET Core Empty” template, and runs it. By default it runs on Kestrel on port 5000. If we access it in a browser and check the response headers (e.g. using the Network tab of the Chrome Developer Tools), we see that there’s this Server header with a value of Kestrel. If it were running under IIS, this value might have been MicrosoftIIS/10.0 instead.

Honestly, this could be worse. Older versions of ASP .NET running on the old .NET Framework used to add X-Powered-By, X-AspNet-Version and X-AspNet-MvcVersion headers with very specific information about the underlying software. While this information can be really useful for statistical purposes (e.g. to identify the most popular web servers, or to identify how prevalent different versions of ASP .NET are), they are also very useful for malicious purposes (e.g. to look for known vulnerabilities in a specific ASP .NET version).

ASP .NET Core, on the other hand, only adds the Server header, which is quite broad. However, the less information we give a potential attacker, the better for us.

There is no harm in removing the Server header, and to do this in ASP .NET Core, we can take a tip from this Stack Overflow answer:

        public static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                .ConfigureWebHostDefaults(webBuilder =>
                {
                    webBuilder.UseStartup<Startup>()
                              .UseKestrel(options => options.AddServerHeader = false);
                });

The highlighted line above, added to Program.cs, has the effect of getting rid of that Server header. In fact, if we dotnet run again now, we find that it is gone:

It is always a good idea to do a vulnerability assessment of your web application, and in doing so, remove any excess information that complete strangers do not need to know. What we have seen here is a very small change that can reduce the security risk at least by a little.

Getting Started with React

React is a modern JavaScript library for building UI components. In this article, we will go through the steps needed to set up and run a React project. While there is much to be said about React, we will not really delve into theory here as the intention is to get up and running quickly.

Creating a Project

Like other web frontend libraries and frameworks, React requires npm. Therefore, the first thing to do is make sure you have Node.js installed, and if that is the case, then you should already have npm. You can use the following commands to check the version of each (making sure they are installed):

node -v
npm -v

We can then use Create React App to create our React project. Simply run the following command in your terminal:

npx create-react-app my-first-react-app

This will download the latest version of create-react-app automatically if it’s not already available. It takes a couple of minutes to run, and at the end, you will have a directory called my-first-react-app with a basic React project template inside it.

The output at the end (shown above) tells you about the directory that was created, and gives you a few basic commands to get started. In fact, we’ll use the last of those to fire up our web application:

cd my-first-react-app/
npm start

This will open a browser window or tab at the endpoint where the web application is running, which is localhost:3000 by default, and you should see the example page generated by Create React App, with a spinning React logo in it:

Open the project folder using your favourite text editor (e.g. Visual Studio Code), and you can see the project structure, as well as the App.js file which represents the current page:

With the web application still running, replace the highlighted line above (or any other you prefer) with “Hello world”. When you save, the running web application will automatically reload to reflect your changes:

And that’s all! As you can see, it’s quite easy (even if perhaps time-consuming) to create a React project. It’s also easy to run it, and since the project files are being watched, the running application is reloaded every time you save, making it very fast and efficient to make quick development iterations.

You are probably still wondering what React is, what you can do with it, and how/why there is markup within JavaScript! Those, my friend, are topics for another day. 🙂

How to Communicate with Windows Machines from Linux

Transitioning from Windows to Linux is a pleasant experience, but not one for the faint-hearted. There are a lot of things that can take a while to learn: the different filesystem structure, new applications, and the terminal.

If you’ve been stuck with Windows for a long time, chances are that you are not going to switch to Linux entirely in one day and forget about Windows. You probably still want to access resources on that Windows machine, and that, for me, was one of the biggest hassles. Not because it is difficult, but because there are several steps along the way (on both the Windows and Linux sides), and it is really easy to miss one.

The commands in this article have been executed on Kubuntu, and are likely to work on any similar Debian-based distribution.

Ping

Let’s say you’re running an SVN server on your Windows machine, and you’d like to communicate with it from Linux. In order to find that Windows machine, you could try looking up its IP. However, home networks typically use DHCP, which means that a machine’s IP tends to change over time. So while using the IP could work right now, you will likely have to update your configuration again tomorrow.

You could allocate a static IP for this, but a much easier option is to simply look up the name of the machine instead of the IP. You can find out the machine’s name using the hostname command, which works on both Windows and Linux. Once we know the name of the Windows machine, we can try pinging it from Linux to see whether we can reach it:

daniel@orion:~$ ping windowspc
ping: windowspc: No address associated with hostname

That does not look very promising. Unfortunately, Linux machines can’t resolve Windows DNS out of the box. In order to get this working, we first need to install a couple of packages:

daniel@orion:~$ sudo apt install winbind libnss-winbind

After that, we need to edit the /etc/nsswitch.conf file, which on a fresh Kubuntu installation would look something like this:

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         files systemd
group:          files systemd
shadow:         files
gshadow:        files

hosts:          files mdns4_minimal [NOTFOUND=return] dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

Use whichever editor you prefer, to update the highlighted line above to this:

hosts:          files mdns4_minimal [NOTFOUND=return] dns wins mdns4

If you try pinging again, it should now work. No restart is necessary.

daniel@orion:~$ ping windowspc
PING windowspc (192.168.1.73) 56(84) bytes of data.
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=1 ttl=128 time=614 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=2 ttl=128 time=519 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=3 ttl=128 time=441 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=4 ttl=128 time=55.2 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=5 ttl=128 time=2.67 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=6 ttl=128 time=510 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=7 ttl=128 time=430 ms
64 bytes from 192.168.1.73 (192.168.1.73): icmp_seq=8 ttl=128 time=48.9 ms
^C
--- windowspc ping statistics ---
9 packets transmitted, 8 received, 11.1111% packet loss, time 12630ms
rtt min/avg/max/mdev = 2.671/327.520/613.590/232.503 ms

Resolving the hostname can sometimes take time. If you’re using a client application that can’t seem to resolve the Windows machine name, give it a few seconds, or try pinging it again. It should work after that.

Update 5th December 2019: If this doesn’t work, there are a couple of things I’ve seen recommended. One is to move the wins entry in /etc/nsswitch.conf to right after the files entry. Another is to try restarting the winbind service and see whether it makes a difference: sudo systemctl restart winbind.

Windows File Share

Another important aspect to interoperability between Windows and Linux is how to pass files between them. Fortunately, Linux comes with software called Samba that allows it to see and work with Windows file shares.

Before we do this, we need to create a shared folder on Windows. To do this, create a new folder (e.g. named share) on your Windows machine, then right click on it and select Properties. In the Sharing tab, there’s a button that says Advanced Sharing:

Click on it, and in the next modal window, check the box that says Share this folder. You can then OK all the way out without making further changes.

Through Kubuntu’s file manager application, called Dolphin, you can navigate to any Windows file shares visible on the network, even if you haven’t done the setup in the previous section.

To do this, select Network from the left, then double-click Shared Folders (SMB):

Next, select Workgroup:

You should now be able to see any Windows or Linux machines. Select the icon with the name of your Windows machine.

You should be prompted for credentials, and at that stage enter the same username and password that you use to login on Windows.

We can now see the shared folders on the Windows machine, including the shared folder we created earlier:

If, on the Windows side, we drop a file into that share folder, we can see it from Linux, and we are perfectly able to copy it over:

Unfortunately however, the same is not yet true in reverse. If we try to copy a file from the Linux machine into share, we get a lousy Access denied error:

It seems to be a permissions issue, so let’s go back on Windows and see what we might have missed. If we right click the folder and select Properties, we notice that the folder appears to be read-only:

This in fact has nothing to do with the problem, and attempting to change it has no effect.

Instead, what we need to do is go back to that Advanced Sharing modal window (via the Advanced Sharing button in the Properties’ Sharing tab). Click the Permissions button to see who has access to that folder. It seems like Everyone is listed but only has Read access. Please resist the temptation or other internet advice to give full access to Everyone, and instead look up the user you normally use to log into Windows:

You can then give your user full control:

You can now drop a file into the share folder from Linux without any problems:

Summary

Talking to a Windows machine from Linux is possible, but slightly tricky to set up.

In order for client applications on Linux to talk to server applications on Windows, install the winbind and libnss-winbind packages, and edit /etc/nsswitch.conf to enable DNS resolution for Windows machines. Use ping to verify that the hostname is beig resolved.

To share files between Windows and Linux, set up a shared folder on Windows. Add your Windows user to the list of people who can access the folder, giving it both read and write permissions. Then, from Linux, use the file manager application’s existing Samba integration to reach and work with the shared folder.

Family Tree with RedisGraph

In “First Steps with RedisGraph“, after getting up and running, we used a couple of simple graphs to understand what we can do with Cypher and RedisGraph.

This time, we will look at a third and more complex example: building and querying a family tree.

The ancient Family Tree 2.0 application for Windows 95.

For me, this not just an interesting example, but a matter of personal interest and the reason why I am learning graph databases in the first place. In 2001, I came upon a Family Tree application from the Windows 95 era, and gradually built out my family tree. By the time I realised that it was getting harder to run with each new version of Windows, it was too big to easily and reliably migrate all the data to a new system. Fortunately, Linux is more capable of running this software than Windows.

This software, and others like it, allow you to do a number of things. The first and most obvious is data entry (manually or via an import function) in order to build the family tree. Other than that, they also allow you to query the structure of the family tree, bringing out visualisations (such as descendant trees, ancestor trees, chronological trees etc), statistics (e.g. average age at marriage, life expectancy, average number of children, etc), and answers to simple questions (e.g. who died in 1952?).

An Example Family Tree

In order to have something we can play with, we’ll use this family tree:

This is the example family tree that we will use throughout this article.

This data is entirely fictitious, and while it is a non-trivial structure, I would like to point out a priori several assumptions and design decisions that I have taken in order to keep the structure simple and avoid getting lost in the details of this already lengthy article:

  1. All children are the result of a marriage. Obviously, this is not necessarily the case in real life.
  2. All marriages are between a husband and a wife. This is also not necessarily the case in real life. Note that this does not exclude that a single person may be married multiple times.
  3. When representing dates, we are focusing only on the year in order to avoid complicating things with date arithmetic. In reality, family tree software should not just cater for full dates, but also for dates where some part is unknown (e.g. 1896-01-??).
  4. Parent-child relationships are represented as childOf arrows, from the child to each parent. This approach is quite different from others you might come across (such as those documented by Rik Van Bruggen). It allows us to maintain a simple structure while not duplicating any information (because the year of birth is stored with the child).
  5. A man marries a woman. In reality, it should be a bidirectional relationship, but we cannot have that in RedisGraph without having two relationships in opposite directions. Having the relationship go in a single direction turns out to be enough for the queries we need, so there is no need to duplicate that information. The direction was chosen arbitrarily and if anyone feels offended, you are more than welcome to reverse it.

Loading Data in RedisGraph

As we’re now dealing with larger examples, it is not very practical to interactively type or paste the RedisGraph commands into redis-cli to insert the data we need. Instead, we can prepare a file containing the commands we want to execute, and then pipe it into redis-cli as follows:

cat familytree.txt | redis-cli --pipe

In our case, you can get the commands to create the example family tree either from the Gigi Labs BitBucket Repository (look for RedisGraph-FamilyTree/familytree.txt) or in the code snippet below:

GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'John', gender: 'm', born: 1932, died: 1982})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Victoria', gender: 'f', born: 1934, died: 2006})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Joseph', gender: 'm', born: 1958})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Christina', gender: 'f', born: 1957, died: 2018})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Donald', gender: 'm', born: 1984})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Eleonora', gender: 'f', born: 1986, died: 2010})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Nancy', gender: 'f', born: 1982})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Anthony', gender: 'm', born: 2010})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'George', gender: 'm', born: 2012})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Antoinette', gender: 'f', born: 1967})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Alfred', gender: 'm', born: 1965})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Bernard', gender: 'm', born: 1997})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Fiona', gender: 'f', born: 2000})"

GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'John' }), (woman:Person { name : 'Victoria' }) CREATE (man)-[:married { year: 1956 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Joseph' }), (woman:Person { name : 'Christina' }) CREATE (man)-[:married { year: 1981 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Donald' }), (woman:Person { name : 'Eleonora' }) CREATE (man)-[:married { year: 2008 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Donald' }), (woman:Person { name : 'Nancy' }) CREATE (man)-[:married { year: 2011 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Alfred' }), (woman:Person { name : 'Antoinette' }) CREATE (man)-[:married { year: 1992 }]->(woman)"

GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Joseph' }), (parent:Person { name : 'John' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Joseph' }), (parent:Person { name : 'Victoria' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Donald' }), (parent:Person { name : 'Joseph' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Donald' }), (parent:Person { name : 'Christina' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Anthony' }), (parent:Person { name : 'Donald' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Anthony' }), (parent:Person { name : 'Eleonora' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'George' }), (parent:Person { name : 'Donald' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'George' }), (parent:Person { name : 'Nancy' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Antoinette' }), (parent:Person { name : 'John' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Antoinette' }), (parent:Person { name : 'Victoria' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Bernard' }), (parent:Person { name : 'Alfred' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Bernard' }), (parent:Person { name : 'Antoinette' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Fiona' }), (parent:Person { name : 'Alfred' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Fiona' }), (parent:Person { name : 'Antoinette' }) CREATE (child)-[:childOf]->(parent)"

There are certainly other ways in which the above commands could be rewritten to be more compact, but I wanted to focus more on keeping things readable in this case.

Sidenote: When creating the nodes (not the relationships), another option could be to keep only the JSON-like property structure in a file (see RedisGraph-FamilyTree/familytree-persons.txt), and then use awk to generate the beginning and end of each command:

awk '{print "GRAPH.QUERY FamilyTree \"CREATE (:Person " $0 ")\""}' familytree-persons.txt | redis-cli --pipe

Querying the Family Tree

Once the family tree data has been loaded, we can finally query it and get some meaningful information. You might want to keep the earlier family tree picture open in a separate window while you read on, to help you follow along.

First, let’s list all individuals:

GRAPH.QUERY FamilyTree "MATCH (x) RETURN x.name"
1) 1) "x.name"
2)  1) 1) "John"
    2) 1) "Victoria"
    3) 1) "Joseph"
    4) 1) "Christina"
    5) 1) "Donald"
    6) 1) "Eleonora"
    7) 1) "Nancy"
    8) 1) "Anthony"
    9) 1) "George"
   10) 1) "Antoinette"
   11) 1) "Alfred"
   12) 1) "Bernard"
   13) 1) "Fiona"
3) 1) "Query internal execution time: 0.631002 milliseconds"

Next, we’ll use the ORDER BY clause to get a chronological report based on the year people were born:

GRAPH.QUERY FamilyTree "MATCH (x) RETURN x.name, x.born ORDER BY x.born"
1) 1) "x.name"
   2) "x.born"
2)  1) 1) "John"
       2) (integer) 1932
    2) 1) "Victoria"
       2) (integer) 1934
    3) 1) "Christina"
       2) (integer) 1957
    4) 1) "Joseph"
       2) (integer) 1958
    5) 1) "Alfred"
       2) (integer) 1965
    6) 1) "Antoinette"
       2) (integer) 1967
    7) 1) "Nancy"
       2) (integer) 1982
    8) 1) "Donald"
       2) (integer) 1984
    9) 1) "Eleonora"
       2) (integer) 1986
   10) 1) "Bernard"
       2) (integer) 1997
   11) 1) "Fiona"
       2) (integer) 2000
   12) 1) "Anthony"
       2) (integer) 2010
   13) 1) "George"
       2) (integer) 2012
3) 1) "Query internal execution time: 0.895734 milliseconds"

By adding in a WHERE clause, we can retrieve all those born before 1969, and return them in order of year of birth as in the previous query:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE x.born < 1969 RETURN x.name, x.born ORDER BY x.born"
1) 1) "x.name"
   2) "x.born"
2) 1) 1) "John"
      2) (integer) 1932
   2) 1) "Victoria"
      2) (integer) 1934
   3) 1) "Christina"
      2) (integer) 1957
   4) 1) "Joseph"
      2) (integer) 1958
   5) 1) "Alfred"
      2) (integer) 1965
   6) 1) "Antoinette"
      2) (integer) 1967
3) 1) "Query internal execution time: 1.097382 milliseconds"

EXISTS allows us to check whether a property is set. Using it with the died property, we can list all the people who died:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "John"
   2) 1) "Victoria"
   3) 1) "Christina"
   4) 1) "Eleonora"
3) 1) "Query internal execution time: 0.936778 milliseconds"

By changing that to NOT EXISTS, we can get the opposite, i.e. all the people who are still alive:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE NOT EXISTS(x.died) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "Joseph"
   2) 1) "Donald"
   3) 1) "Nancy"
   4) 1) "Anthony"
   5) 1) "George"
   6) 1) "Antoinette"
   7) 1) "Alfred"
   8) 1) "Bernard"
   9) 1) "Fiona"
3) 1) "Query internal execution time: 1.150569 milliseconds"

Next, let’s answer some questions about specific individuals.

When did Christina die?

GRAPH.QUERY FamilyTree "MATCH (x) WHERE x.name = 'Christina' RETURN x.died ORDER BY x.born"
1) 1) "x.died"
2) 1) 1) (integer) 2018
3) 1) "Query internal execution time: 0.948734 milliseconds"

Who is George’s mother?

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf]->(p) WHERE c.name = 'George' AND p.gender = 'f' RETURN p.name"
1) 1) "p.name"
2) 1) 1) "Nancy"
3) 1) "Query internal execution time: 1.859084 milliseconds"

At what age did Eleonora get married? Note here that we’re using the AS keyword to change the title of the returned field (just like in SQL):

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) WHERE f.name = 'Christina' RETURN r.year - f.born AS AgeAtMarriage"
1) 1) "AgeAtMarriage"
2) 1) 1) (integer) 24
3) 1) "Query internal execution time: 1.442386 milliseconds"

How many children did Alfred have? In this case, we use the COUNT() aggregate function. Again, it works just like in SQL:

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf]->(p) WHERE p.name = 'Alfred' RETURN COUNT(c)"
1) 1) "COUNT(c)"
2) 1) 1) (integer) 2
3) 1) "Query internal execution time: 1.305086 milliseconds"

Let’s get all of Anthony’s ancestors! Here we use the *1.. syntax to indicate that this is not a single relationship, but indeed a path that is made up of one or more hops.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE c.name = 'Anthony' RETURN p.name"
1) 1) "p.name"
2) 1) 1) "Eleonora"
   2) 1) "Donald"
   3) 1) "Christina"
   4) 1) "Joseph"
   5) 1) "Victoria"
   6) 1) "John"
3) 1) "Query internal execution time: 1.456897 milliseconds"

How about Victoria’s descendants? This is the same as the ancestors query in terms of the MATCH clause, but it’s got the WHERE and RETURN parts swapped.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE p.name = 'Victoria' RETURN c.name"
1) 1) "c.name"
2) 1) 1) "Antoinette"
   2) 1) "Fiona"
   3) 1) "Bernard"
   4) 1) "Joseph"
   5) 1) "Donald"
   6) 1) "George"
   7) 1) "Anthony"
3) 1) "Query internal execution time: 1.158366 milliseconds"

Can we get Donald’s ancestors and descentants using a single query? Yes! We can use the UNION operator to combine the ancestors and descentants queries. Note that in this case the AS keyword is required, because subqueries of a UNION must have the same column names.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE c.name = 'Donald' RETURN p.name AS name UNION MATCH (c)-[:childOf*1..]->(p) WHERE p.name = 'Donald' RETURN c.name AS name"
1) 1) "name"
2) 1) 1) "Christina"
   2) 1) "Joseph"
   3) 1) "Victoria"
   4) 1) "John"
   5) 1) "George"
   6) 1) "Anthony"
3) 1) "Query internal execution time: 78.088850 milliseconds"

Who are Donald’s cousins? This is a little more complicated because we need two paths that feed into the same parent, exactly two hops away (because one hop away would be siblings). We also need to exclude Donald and his siblings (if he had any) because they could otherwise match the specified pattern.

GRAPH.QUERY FamilyTree "MATCH (c1:Person)-[:childOf]->(p1:Person)-[:childOf]->(:Person)<-[:childOf]-(p2:Person)<-[:childOf]-(c2:Person) WHERE p1 <> p2 AND c1.name = 'Donald' RETURN c2.name"
1) 1) "c2.name"
2) 1) 1) "Bernard"
   2) 1) "Fiona"
3) 1) "Query internal execution time: 2.133173 milliseconds"

Update 4th December 2019: The ancestors and descendants query has been added, and the cousins query improved, thanks to the contributions of people in this GitHub issue. Thank you!

Statistical Queries

The last two queries I’d like to show are statistical in nature, and since they’re not easy to visualise directly, I’d like to get to them in steps.

First, let’s calculate life expectancy. In order to understand this, let’s first run a query retrieving the year of birth and death of those people who are already dead:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN x.born, x.died"
1) 1) "x.born"
   2) "x.died"
2) 1) 1) (integer) 1932
      2) (integer) 1982
   2) 1) (integer) 1934
      2) (integer) 2006
   3) 1) (integer) 1957
      2) (integer) 2018
   4) 1) (integer) 1986
      2) (integer) 2010
3) 1) "Query internal execution time: 1.066981 milliseconds"

Since life expectancy is the average age at which people die, then for each of those born/died pairs, we need to subtract born from died to get the age at death for each person, and then average them out. We can do this using the AVG() aggregate function, which like COUNT() may be reminiscent of SQL.

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN AVG( x.died - x.born )"
1) 1) "AVG( x.died - x.born )"
2) 1) 1) "51.75"
3) 1) "Query internal execution time: 1.208347 milliseconds"

The second statistic we’ll calculate is the average age at marriage. This is similar to life expectancy, except that in this case there are two people in each marriage, which complicates things slightly.

Once again, let’s visualise the situation first, by retrieving separately the ages of the female and the male when they got married:

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) RETURN r.year - f.born, r.year - m.born"
1) 1) "r.year - f.born"
   2) "r.year - m.born"
2) 1) 1) (integer) 22
      2) (integer) 24
   2) 1) (integer) 24
      2) (integer) 23
   3) 1) (integer) 22
      2) (integer) 24
   4) 1) (integer) 29
      2) (integer) 27
   5) 1) (integer) 25
      2) (integer) 27

Therefore, we have five marriages but ten ages at marriage, which is a little confusing to work out an average. However, we can still get to the number we want by adding up the ages for each couple, working out the average, and then dividing by 2 at the end to make up for the difference in the number of values:

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) RETURN AVG( (r.year - f.born) + (r.year - m.born) ) / 2"
1) 1) "AVG( (r.year - f.born) + (r.year - m.born) ) / 2"
2) 1) 1) "24.7"
3) 1) "Query internal execution time: 48.874147 milliseconds"

Wrapping Up

We’ve seen another example graph — a family tree — in this article. We discussed the reasons behind the chosen representation, delved into efficient ways to quickly create it from a text file, and then ran a whole bunch of queries to answer different questions and analyse the data in the family tree.

One thing I’m still not sure how to do is whether it’s possible, given two people, to identify their relationship (e.g. cousin, sibling, parent, etc) based on the path between them.

As all this is something I’m still learning, I’m more than happy to receive feedback on how to do things better and perhaps other things you can do which I’m not even aware of.

First Steps with RedisGraph

RedisGraph is a super-fast graph database, and like others of its kind (such as Neo4j), it is useful to represent networks of entities and their relationships. Examples include social networks, family trees, and organisation charts.

Getting Started

The easiest way to try RedisGraph is using Docker. Use the following command, which is based on what the Quickstart recommends but instead uses the edge tag, which would have the latest features and fixes:

sudo docker run -p 6379:6379 -it --rm redislabs/redisgraph:edge
Redis with RedisGraph running in Docker

You will also need the redis-cli tool to run the example queries. On Ubuntu (or similar), you can get this by installing the redis-tools package.

Tom Loves Judy

We’ll start by representing something really simple: Tom Loves Judy.

Tom Loves Judy.

We can create this graph using a single command:

GRAPH.QUERY TomLovesJudy "CREATE (tom:Person {name: 'Tom'})-[:loves]->(judy:Person {name: 'Judy'})"

When using redis-cli, queries will also follow the format of GRAPH.QUERY <key> "<cypher_query>". In RedisGraph, a graph is stored in a Redis key (in this case called “TomLovesJudy“) with the special type graphdata, thus this must always be specified in queries. The query itself is the part between double quotes, and uses a language called Cypher. Cypher is also used by Neo4j among other software, and RedisGraph implements a subset of it.

Cypher represents nodes and relationships using a sort of ASCII art. Nodes are represented by round brackets (parentheses), and relationships are represented by square brackets. The arrow indicates the direction of the relationship. RedisGraph at present does not support undirected relationships. When you run the above command, Redis should provide some output indicating what happened:

2 nodes and one relationship. Makes sense.

Since our graph has been created, we can start running queries against it. For this, we use the MATCH keyword:

GRAPH.QUERY TomLovesJudy "MATCH (x) RETURN x"

Since round brackets represent a node, here we’re saying that we want the query to match any node, which we’ll call x, and then return it. The output for this is quite verbose:

1) 1) "x"
2) 1) 1) 1) 1) "id"
            2) (integer) 0
         2) 1) "labels"
            2) 1) "Person"
         3) 1) "properties"
            2) 1) 1) "name"
                  2) "Tom"
   2) 1) 1) 1) "id"
            2) (integer) 1
         2) 1) "labels"
            2) 1) "Person"
         3) 1) "properties"
            2) 1) 1) "name"
                  2) "Judy"
3) 1) "Query internal execution time: 61.509847 milliseconds"

As you can see, this has given us the whole structure of each node. If we just want to get something specific, such as the name, then we can specify it in the RETURN clause:

GRAPH.QUERY TomLovesJudy "MATCH (x) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "Tom"
   2) 1) "Judy"
3) 1) "Query internal execution time: 0.638126 milliseconds"

We can also query based on relationships. Let’s see who loves who:

GRAPH.QUERY TomLovesJudy "MATCH (x)-[:loves]->(y) RETURN x.name, y.name"
1) 1) "x.name"
   2) "y.name"
2) 1) 1) "Tom"
      2) "Judy"
3) 1) "Query internal execution time: 54.642536 milliseconds"

It seems like Tom Loves Judy. Unfortunately, Judy does not love Tom back.

Company Shareholding

Let’s take a look at a slightly more interesting example.

Company A is owned by individuals X (85%) and Y (15%). Company B is owned by individuals Y (55%) and Z (45%).

In this graph, we have companies (blue nodes) which are owned by multiple individuals (red nodes). We can’t create this as a single command as we did before. We also can’t simply issue a series of CREATE commands, because we may end up creating multiple nodes with the same name.

Instead, let’s create all the nodes separately first:

GRAPH.QUERY Companies "CREATE (:Individual {name: 'X'})"
GRAPH.QUERY Companies "CREATE (:Individual {name: 'Y'})"
GRAPH.QUERY Companies "CREATE (:Individual {name: 'Z'})"

GRAPH.QUERY Companies "CREATE (:Company {name: 'A'})"
GRAPH.QUERY Companies "CREATE (:Company {name: 'B'})"

You’ll notice here that the way we are defining nodes is a little different. A node follows the structure (alias:type {properties}). The alias is not much use in such CREATE commands, but on the other hand, the type now (unlike in the earlier example) gives us a way to distinguish between different kinds of nodes.

Now that we have the nodes, we can create the relationships:

GRAPH.QUERY Companies "MATCH (x:Individual { name : 'X' }), (c:Company { name : 'A' }) CREATE (x)-[:owns {percentage: 85}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Y' }), (c:Company { name : 'A' }) CREATE (x)-[:owns {percentage: 15}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Y' }), (c:Company { name : 'B' }) CREATE (x)-[:owns {percentage: 55}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Z' }), (c:Company { name : 'B' }) CREATE (x)-[:owns {percentage: 45}]->(c)"

In order to make sure we apply the relationships to existing nodes (as opposed to creating new ones), we first find the nodes we want with a MATCH clause, and then CREATE the relationship between them. You’ll notice that our relationships now also have properties.

Now that our graph is set up, we can start querying it! Here are a few things we can do with it.

Return the names of all the nodes:

GRAPH.QUERY Companies "MATCH (x) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "X"
   2) 1) "Y"
   3) 1) "Z"
   4) 1) "A"
   5) 1) "B"
3) 1) "Query internal execution time: 0.606600 milliseconds"

Return the names only of the companies:

GRAPH.QUERY Companies "MATCH (c:Company) RETURN c.name"
1) 1) "c.name"
2) 1) 1) "A"
   2) 1) "B"
3) 1) "Query internal execution time: 0.515959 milliseconds"

Return individual ownership in each company (separate fields):

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) RETURN i.name, s.percentage, c.name"
1) 1) "i.name"
   2) "s.percentage"
   3) "c.name"
2) 1) 1) "X"
      2) (integer) 85
      3) "A"
   2) 1) "Y"
      2) (integer) 15
      3) "A"
   3) 1) "Y"
      2) (integer) 55
      3) "B"
   4) 1) "Z"
      2) (integer) 45
      3) "B"
3) 1) "Query internal execution time: 1.627741 milliseconds"

Return individual ownership in each company (concatenated strings):

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) RETURN i.name + ' owns ' + round(s.percentage) + '% of ' + c.name"
1) 1) "i.name + ' owns ' + round(s.percentage) + '% of ' + c.name"
2) 1) 1) "X owns 85% of A"
   2) 1) "Y owns 15% of A"
   3) 1) "Y owns 55% of B"
   4) 1) "Z owns 45% of B"
3) 1) "Query internal execution time: 1.281184 milliseconds"

Find out who owns at least 50% of the shares in Company A:

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) WHERE s.percentage >= 50 AND c.name = 'A' RETURN i.name"
1) 1) "i.name"
2) 1) 1) "X"
3) 1) "Query internal execution time: 1.321579 milliseconds"

Wrapping Up

In this article, we’ve seen how to:

  • get up and running with RedisGraph
  • create simple graphs
  • perform basic queries

We’ve obviously scratched the surface of RedisGraph and Cypher, but hopefully these examples will help others who, like me, are new to this space.