# Family Tree with RedisGraph

In “First Steps with RedisGraph“, after getting up and running, we used a couple of simple graphs to understand what we can do with Cypher and RedisGraph.

This time, we will look at a third and more complex example: building and querying a family tree.

For me, this not just an interesting example, but a matter of personal interest and the reason why I am learning graph databases in the first place. In 2001, I came upon a Family Tree application from the Windows 95 era, and gradually built out my family tree. By the time I realised that it was getting harder to run with each new version of Windows, it was too big to easily and reliably migrate all the data to a new system. Fortunately, Linux is more capable of running this software than Windows.

This software, and others like it, allow you to do a number of things. The first and most obvious is data entry (manually or via an import function) in order to build the family tree. Other than that, they also allow you to query the structure of the family tree, bringing out visualisations (such as descendant trees, ancestor trees, chronological trees etc), statistics (e.g. average age at marriage, life expectancy, average number of children, etc), and answers to simple questions (e.g. who died in 1952?).

## An Example Family Tree

In order to have something we can play with, we’ll use this family tree:

This data is entirely fictitious, and while it is a non-trivial structure, I would like to point out a priori several assumptions and design decisions that I have taken in order to keep the structure simple and avoid getting lost in the details of this already lengthy article:

1. All children are the result of a marriage. Obviously, this is not necessarily the case in real life.
2. All marriages are between a husband and a wife. This is also not necessarily the case in real life. Note that this does not exclude that a single person may be married multiple times.
3. When representing dates, we are focusing only on the year in order to avoid complicating things with date arithmetic. In reality, family tree software should not just cater for full dates, but also for dates where some part is unknown (e.g. 1896-01-??).
4. Parent-child relationships are represented as childOf arrows, from the child to each parent. This approach is quite different from others you might come across (such as those documented by Rik Van Bruggen). It allows us to maintain a simple structure while not duplicating any information (because the year of birth is stored with the child).
5. A man marries a woman. In reality, it should be a bidirectional relationship, but we cannot have that in RedisGraph without having two relationships in opposite directions. Having the relationship go in a single direction turns out to be enough for the queries we need, so there is no need to duplicate that information. The direction was chosen arbitrarily and if anyone feels offended, you are more than welcome to reverse it.

As we’re now dealing with larger examples, it is not very practical to interactively type or paste the RedisGraph commands into redis-cli to insert the data we need. Instead, we can prepare a file containing the commands we want to execute, and then pipe it into redis-cli as follows:

cat familytree.txt | redis-cli --pipe

In our case, you can get the commands to create the example family tree either from the Gigi Labs BitBucket Repository (look for RedisGraph-FamilyTree/familytree.txt) or in the code snippet below:

GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'John', gender: 'm', born: 1932, died: 1982})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Victoria', gender: 'f', born: 1934, died: 2006})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Joseph', gender: 'm', born: 1958})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Christina', gender: 'f', born: 1957, died: 2018})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Donald', gender: 'm', born: 1984})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Eleonora', gender: 'f', born: 1986, died: 2010})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Nancy', gender: 'f', born: 1982})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Anthony', gender: 'm', born: 2010})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'George', gender: 'm', born: 2012})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Antoinette', gender: 'f', born: 1967})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Alfred', gender: 'm', born: 1965})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Bernard', gender: 'm', born: 1997})"
GRAPH.QUERY FamilyTree "CREATE (:Person {name: 'Fiona', gender: 'f', born: 2000})"

GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'John' }), (woman:Person { name : 'Victoria' }) CREATE (man)-[:married { year: 1956 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Joseph' }), (woman:Person { name : 'Christina' }) CREATE (man)-[:married { year: 1981 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Donald' }), (woman:Person { name : 'Eleonora' }) CREATE (man)-[:married { year: 2008 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Donald' }), (woman:Person { name : 'Nancy' }) CREATE (man)-[:married { year: 2011 }]->(woman)"
GRAPH.QUERY FamilyTree "MATCH (man:Person { name : 'Alfred' }), (woman:Person { name : 'Antoinette' }) CREATE (man)-[:married { year: 1992 }]->(woman)"

GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Joseph' }), (parent:Person { name : 'John' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Joseph' }), (parent:Person { name : 'Victoria' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Donald' }), (parent:Person { name : 'Joseph' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Donald' }), (parent:Person { name : 'Christina' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Anthony' }), (parent:Person { name : 'Donald' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Anthony' }), (parent:Person { name : 'Eleonora' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'George' }), (parent:Person { name : 'Donald' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'George' }), (parent:Person { name : 'Nancy' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Antoinette' }), (parent:Person { name : 'John' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Antoinette' }), (parent:Person { name : 'Victoria' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Bernard' }), (parent:Person { name : 'Alfred' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Bernard' }), (parent:Person { name : 'Antoinette' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Fiona' }), (parent:Person { name : 'Alfred' }) CREATE (child)-[:childOf]->(parent)"
GRAPH.QUERY FamilyTree "MATCH (child:Person { name : 'Fiona' }), (parent:Person { name : 'Antoinette' }) CREATE (child)-[:childOf]->(parent)"

There are certainly other ways in which the above commands could be rewritten to be more compact, but I wanted to focus more on keeping things readable in this case.

Sidenote: When creating the nodes (not the relationships), another option could be to keep only the JSON-like property structure in a file (see RedisGraph-FamilyTree/familytree-persons.txt), and then use awk to generate the beginning and end of each command:

awk '{print "GRAPH.QUERY FamilyTree \"CREATE (:Person " \$0 ")\""}' familytree-persons.txt | redis-cli --pipe

## Querying the Family Tree

Once the family tree data has been loaded, we can finally query it and get some meaningful information. You might want to keep the earlier family tree picture open in a separate window while you read on, to help you follow along.

First, let’s list all individuals:

GRAPH.QUERY FamilyTree "MATCH (x) RETURN x.name"
1) 1) "x.name"
2)  1) 1) "John"
2) 1) "Victoria"
3) 1) "Joseph"
4) 1) "Christina"
5) 1) "Donald"
6) 1) "Eleonora"
7) 1) "Nancy"
8) 1) "Anthony"
9) 1) "George"
10) 1) "Antoinette"
11) 1) "Alfred"
12) 1) "Bernard"
13) 1) "Fiona"
3) 1) "Query internal execution time: 0.631002 milliseconds"

Next, we’ll use the ORDER BY clause to get a chronological report based on the year people were born:

GRAPH.QUERY FamilyTree "MATCH (x) RETURN x.name, x.born ORDER BY x.born"
1) 1) "x.name"
2) "x.born"
2)  1) 1) "John"
2) (integer) 1932
2) 1) "Victoria"
2) (integer) 1934
3) 1) "Christina"
2) (integer) 1957
4) 1) "Joseph"
2) (integer) 1958
5) 1) "Alfred"
2) (integer) 1965
6) 1) "Antoinette"
2) (integer) 1967
7) 1) "Nancy"
2) (integer) 1982
8) 1) "Donald"
2) (integer) 1984
9) 1) "Eleonora"
2) (integer) 1986
10) 1) "Bernard"
2) (integer) 1997
11) 1) "Fiona"
2) (integer) 2000
12) 1) "Anthony"
2) (integer) 2010
13) 1) "George"
2) (integer) 2012
3) 1) "Query internal execution time: 0.895734 milliseconds"

By adding in a WHERE clause, we can retrieve all those born before 1969, and return them in order of year of birth as in the previous query:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE x.born < 1969 RETURN x.name, x.born ORDER BY x.born"
1) 1) "x.name"
2) "x.born"
2) 1) 1) "John"
2) (integer) 1932
2) 1) "Victoria"
2) (integer) 1934
3) 1) "Christina"
2) (integer) 1957
4) 1) "Joseph"
2) (integer) 1958
5) 1) "Alfred"
2) (integer) 1965
6) 1) "Antoinette"
2) (integer) 1967
3) 1) "Query internal execution time: 1.097382 milliseconds"

EXISTS allows us to check whether a property is set. Using it with the died property, we can list all the people who died:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "John"
2) 1) "Victoria"
3) 1) "Christina"
4) 1) "Eleonora"
3) 1) "Query internal execution time: 0.936778 milliseconds"

By changing that to NOT EXISTS, we can get the opposite, i.e. all the people who are still alive:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE NOT EXISTS(x.died) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "Joseph"
2) 1) "Donald"
3) 1) "Nancy"
4) 1) "Anthony"
5) 1) "George"
6) 1) "Antoinette"
7) 1) "Alfred"
8) 1) "Bernard"
9) 1) "Fiona"
3) 1) "Query internal execution time: 1.150569 milliseconds"

When did Christina die?

GRAPH.QUERY FamilyTree "MATCH (x) WHERE x.name = 'Christina' RETURN x.died ORDER BY x.born"
1) 1) "x.died"
2) 1) 1) (integer) 2018
3) 1) "Query internal execution time: 0.948734 milliseconds"

Who is George’s mother?

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf]->(p) WHERE c.name = 'George' AND p.gender = 'f' RETURN p.name"
1) 1) "p.name"
2) 1) 1) "Nancy"
3) 1) "Query internal execution time: 1.859084 milliseconds"

At what age did Eleonora get married? Note here that we’re using the AS keyword to change the title of the returned field (just like in SQL):

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) WHERE f.name = 'Christina' RETURN r.year - f.born AS AgeAtMarriage"
1) 1) "AgeAtMarriage"
2) 1) 1) (integer) 24
3) 1) "Query internal execution time: 1.442386 milliseconds"

How many children did Alfred have? In this case, we use the COUNT() aggregate function. Again, it works just like in SQL:

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf]->(p) WHERE p.name = 'Alfred' RETURN COUNT(c)"
1) 1) "COUNT(c)"
2) 1) 1) (integer) 2
3) 1) "Query internal execution time: 1.305086 milliseconds"

Let’s get all of Anthony’s ancestors! Here we use the *1.. syntax to indicate that this is not a single relationship, but indeed a path that is made up of one or more hops.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE c.name = 'Anthony' RETURN p.name"
1) 1) "p.name"
2) 1) 1) "Eleonora"
2) 1) "Donald"
3) 1) "Christina"
4) 1) "Joseph"
5) 1) "Victoria"
6) 1) "John"
3) 1) "Query internal execution time: 1.456897 milliseconds"

How about Victoria’s descendants? This is the same as the ancestors query in terms of the MATCH clause, but it’s got the WHERE and RETURN parts swapped.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE p.name = 'Victoria' RETURN c.name"
1) 1) "c.name"
2) 1) 1) "Antoinette"
2) 1) "Fiona"
3) 1) "Bernard"
4) 1) "Joseph"
5) 1) "Donald"
6) 1) "George"
7) 1) "Anthony"
3) 1) "Query internal execution time: 1.158366 milliseconds"

Can we get Donald’s ancestors and descentants using a single query? Yes! We can use the UNION operator to combine the ancestors and descentants queries. Note that in this case the AS keyword is required, because subqueries of a UNION must have the same column names.

GRAPH.QUERY FamilyTree "MATCH (c)-[:childOf*1..]->(p) WHERE c.name = 'Donald' RETURN p.name AS name UNION MATCH (c)-[:childOf*1..]->(p) WHERE p.name = 'Donald' RETURN c.name AS name"
1) 1) "name"
2) 1) 1) "Christina"
2) 1) "Joseph"
3) 1) "Victoria"
4) 1) "John"
5) 1) "George"
6) 1) "Anthony"
3) 1) "Query internal execution time: 78.088850 milliseconds"

Who are Donald’s cousins? This is a little more complicated because we need two paths that feed into the same parent, exactly two hops away (because one hop away would be siblings). We also need to exclude Donald and his siblings (if he had any) because they could otherwise match the specified pattern.

GRAPH.QUERY FamilyTree "MATCH (c1:Person)-[:childOf]->(p1:Person)-[:childOf]->(:Person)<-[:childOf]-(p2:Person)<-[:childOf]-(c2:Person) WHERE p1 <> p2 AND c1.name = 'Donald' RETURN c2.name"
1) 1) "c2.name"
2) 1) 1) "Bernard"
2) 1) "Fiona"
3) 1) "Query internal execution time: 2.133173 milliseconds"

Update 4th December 2019: The ancestors and descendants query has been added, and the cousins query improved, thanks to the contributions of people in this GitHub issue. Thank you!

## Statistical Queries

The last two queries I’d like to show are statistical in nature, and since they’re not easy to visualise directly, I’d like to get to them in steps.

First, let’s calculate life expectancy. In order to understand this, let’s first run a query retrieving the year of birth and death of those people who are already dead:

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN x.born, x.died"
1) 1) "x.born"
2) "x.died"
2) 1) 1) (integer) 1932
2) (integer) 1982
2) 1) (integer) 1934
2) (integer) 2006
3) 1) (integer) 1957
2) (integer) 2018
4) 1) (integer) 1986
2) (integer) 2010
3) 1) "Query internal execution time: 1.066981 milliseconds"

Since life expectancy is the average age at which people die, then for each of those born/died pairs, we need to subtract born from died to get the age at death for each person, and then average them out. We can do this using the AVG() aggregate function, which like COUNT() may be reminiscent of SQL.

GRAPH.QUERY FamilyTree "MATCH (x) WHERE EXISTS(x.died) RETURN AVG( x.died - x.born )"
1) 1) "AVG( x.died - x.born )"
2) 1) 1) "51.75"
3) 1) "Query internal execution time: 1.208347 milliseconds"

The second statistic we’ll calculate is the average age at marriage. This is similar to life expectancy, except that in this case there are two people in each marriage, which complicates things slightly.

Once again, let’s visualise the situation first, by retrieving separately the ages of the female and the male when they got married:

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) RETURN r.year - f.born, r.year - m.born"
1) 1) "r.year - f.born"
2) "r.year - m.born"
2) 1) 1) (integer) 22
2) (integer) 24
2) 1) (integer) 24
2) (integer) 23
3) 1) (integer) 22
2) (integer) 24
4) 1) (integer) 29
2) (integer) 27
5) 1) (integer) 25
2) (integer) 27

Therefore, we have five marriages but ten ages at marriage, which is a little confusing to work out an average. However, we can still get to the number we want by adding up the ages for each couple, working out the average, and then dividing by 2 at the end to make up for the difference in the number of values:

GRAPH.QUERY FamilyTree "MATCH (m)-[r:married]->(f) RETURN AVG( (r.year - f.born) + (r.year - m.born) ) / 2"
1) 1) "AVG( (r.year - f.born) + (r.year - m.born) ) / 2"
2) 1) 1) "24.7"
3) 1) "Query internal execution time: 48.874147 milliseconds"

## Wrapping Up

We’ve seen another example graph — a family tree — in this article. We discussed the reasons behind the chosen representation, delved into efficient ways to quickly create it from a text file, and then ran a whole bunch of queries to answer different questions and analyse the data in the family tree.

One thing I’m still not sure how to do is whether it’s possible, given two people, to identify their relationship (e.g. cousin, sibling, parent, etc) based on the path between them.

As all this is something I’m still learning, I’m more than happy to receive feedback on how to do things better and perhaps other things you can do which I’m not even aware of.

# First Steps with RedisGraph

RedisGraph is a super-fast graph database, and like others of its kind (such as Neo4j), it is useful to represent networks of entities and their relationships. Examples include social networks, family trees, and organisation charts.

## Getting Started

The easiest way to try RedisGraph is using Docker. Use the following command, which is based on what the Quickstart recommends but instead uses the edge tag, which would have the latest features and fixes:

sudo docker run -p 6379:6379 -it --rm redislabs/redisgraph:edge

You will also need the redis-cli tool to run the example queries. On Ubuntu (or similar), you can get this by installing the redis-tools package.

## Tom Loves Judy

We’ll start by representing something really simple: Tom Loves Judy.

We can create this graph using a single command:

GRAPH.QUERY TomLovesJudy "CREATE (tom:Person {name: 'Tom'})-[:loves]->(judy:Person {name: 'Judy'})"

When using redis-cli, queries will also follow the format of GRAPH.QUERY <key> "<cypher_query>". In RedisGraph, a graph is stored in a Redis key (in this case called “TomLovesJudy“) with the special type graphdata, thus this must always be specified in queries. The query itself is the part between double quotes, and uses a language called Cypher. Cypher is also used by Neo4j among other software, and RedisGraph implements a subset of it.

Cypher represents nodes and relationships using a sort of ASCII art. Nodes are represented by round brackets (parentheses), and relationships are represented by square brackets. The arrow indicates the direction of the relationship. RedisGraph at present does not support undirected relationships. When you run the above command, Redis should provide some output indicating what happened:

Since our graph has been created, we can start running queries against it. For this, we use the MATCH keyword:

GRAPH.QUERY TomLovesJudy "MATCH (x) RETURN x"

Since round brackets represent a node, here we’re saying that we want the query to match any node, which we’ll call x, and then return it. The output for this is quite verbose:

1) 1) "x"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "Person"
3) 1) "properties"
2) 1) 1) "name"
2) "Tom"
2) 1) 1) 1) "id"
2) (integer) 1
2) 1) "labels"
2) 1) "Person"
3) 1) "properties"
2) 1) 1) "name"
2) "Judy"
3) 1) "Query internal execution time: 61.509847 milliseconds"

As you can see, this has given us the whole structure of each node. If we just want to get something specific, such as the name, then we can specify it in the RETURN clause:

GRAPH.QUERY TomLovesJudy "MATCH (x) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "Tom"
2) 1) "Judy"
3) 1) "Query internal execution time: 0.638126 milliseconds"

We can also query based on relationships. Let’s see who loves who:

GRAPH.QUERY TomLovesJudy "MATCH (x)-[:loves]->(y) RETURN x.name, y.name"
1) 1) "x.name"
2) "y.name"
2) 1) 1) "Tom"
2) "Judy"
3) 1) "Query internal execution time: 54.642536 milliseconds"

It seems like Tom Loves Judy. Unfortunately, Judy does not love Tom back.

## Company Shareholding

Let’s take a look at a slightly more interesting example.

In this graph, we have companies (blue nodes) which are owned by multiple individuals (red nodes). We can’t create this as a single command as we did before. We also can’t simply issue a series of CREATE commands, because we may end up creating multiple nodes with the same name.

Instead, let’s create all the nodes separately first:

GRAPH.QUERY Companies "CREATE (:Individual {name: 'X'})"
GRAPH.QUERY Companies "CREATE (:Individual {name: 'Y'})"
GRAPH.QUERY Companies "CREATE (:Individual {name: 'Z'})"

GRAPH.QUERY Companies "CREATE (:Company {name: 'A'})"
GRAPH.QUERY Companies "CREATE (:Company {name: 'B'})"

You’ll notice here that the way we are defining nodes is a little different. A node follows the structure (alias:type {properties}). The alias is not much use in such CREATE commands, but on the other hand, the type now (unlike in the earlier example) gives us a way to distinguish between different kinds of nodes.

Now that we have the nodes, we can create the relationships:

GRAPH.QUERY Companies "MATCH (x:Individual { name : 'X' }), (c:Company { name : 'A' }) CREATE (x)-[:owns {percentage: 85}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Y' }), (c:Company { name : 'A' }) CREATE (x)-[:owns {percentage: 15}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Y' }), (c:Company { name : 'B' }) CREATE (x)-[:owns {percentage: 55}]->(c)"
GRAPH.QUERY Companies "MATCH (x:Individual { name : 'Z' }), (c:Company { name : 'B' }) CREATE (x)-[:owns {percentage: 45}]->(c)"

In order to make sure we apply the relationships to existing nodes (as opposed to creating new ones), we first find the nodes we want with a MATCH clause, and then CREATE the relationship between them. You’ll notice that our relationships now also have properties.

Now that our graph is set up, we can start querying it! Here are a few things we can do with it.

Return the names of all the nodes:

GRAPH.QUERY Companies "MATCH (x) RETURN x.name"
1) 1) "x.name"
2) 1) 1) "X"
2) 1) "Y"
3) 1) "Z"
4) 1) "A"
5) 1) "B"
3) 1) "Query internal execution time: 0.606600 milliseconds"

Return the names only of the companies:

GRAPH.QUERY Companies "MATCH (c:Company) RETURN c.name"
1) 1) "c.name"
2) 1) 1) "A"
2) 1) "B"
3) 1) "Query internal execution time: 0.515959 milliseconds"

Return individual ownership in each company (separate fields):

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) RETURN i.name, s.percentage, c.name"
1) 1) "i.name"
2) "s.percentage"
3) "c.name"
2) 1) 1) "X"
2) (integer) 85
3) "A"
2) 1) "Y"
2) (integer) 15
3) "A"
3) 1) "Y"
2) (integer) 55
3) "B"
4) 1) "Z"
2) (integer) 45
3) "B"
3) 1) "Query internal execution time: 1.627741 milliseconds"

Return individual ownership in each company (concatenated strings):

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) RETURN i.name + ' owns ' + round(s.percentage) + '% of ' + c.name"
1) 1) "i.name + ' owns ' + round(s.percentage) + '% of ' + c.name"
2) 1) 1) "X owns 85% of A"
2) 1) "Y owns 15% of A"
3) 1) "Y owns 55% of B"
4) 1) "Z owns 45% of B"
3) 1) "Query internal execution time: 1.281184 milliseconds"

Find out who owns at least 50% of the shares in Company A:

GRAPH.QUERY Companies "MATCH (i)-[s]->(c) WHERE s.percentage >= 50 AND c.name = 'A' RETURN i.name"
1) 1) "i.name"
2) 1) 1) "X"
3) 1) "Query internal execution time: 1.321579 milliseconds"

## Wrapping Up

• get up and running with RedisGraph
• create simple graphs
• perform basic queries

We’ve obviously scratched the surface of RedisGraph and Cypher, but hopefully these examples will help others who, like me, are new to this space.

# StackExchange.Redis Connection In Short

Last year I wrote an article about the right way to set up a Redis connection using StackExchange’s Redis client library. A lot of people found this useful, but at the same time the article went into a lot of detail in order to explain the dangers of doing this wrong. Also, there is a connection string format that’s a lot more concise.

So here’s how you set up a Redis connection using StackExchange.Redis, really quickly. If you need to just try this out quickly, you can grab a copy of Redis for Windows (just remember that this is not supported for production environments).

First, install the NuGet package for the Redis client library:

Install-Package StackExchange.Redis

Then, figure out what connection you need, and build a lazy factory for it (ideally the connection string should come from a configuration file):

private static Lazy<ConnectionMultiplexer> conn
= new Lazy<ConnectionMultiplexer>(
() => ConnectionMultiplexer.Connect(
"localhost:6379,abortConnect=false,syncTimeout=3000"));

My original article goes into detail on why this lazy construct is necessary, but it is mainly because it guarantees thread safety, so the ConnectionMultiplexer will be created only once when it is needed (which is how it’s intended to be used).

You build up a connection string using a comma-separated sequence of configuration parameters (as an alternative to ConfigurationOptions in code, which the original article used). This is more concise but also makes configuration a lot easier.

At the very least, you should have one or more endpoints that you’ll connect to (6379 is the default port in case you leave it out), abortConnect=false to automatically reconnect in case a disconnection occurs (see my original article for details on this), and a reasonable syncTimeout in case some of your Redis operations take long.

The default for syncTimeout is 1 second (i.e. 1000, because the value in milliseconds), and operations against Redis should typically be a lot less than that. But we don’t work in an ideal world, and since Redis is single-threaded, expensive application commands against Redis or even internal Redis operations can cause commands to at times exceed this threshold and result in a timeout. In such cases, you don’t want an operation to fail because of a one-off spike, so just give it a little extra (3 seconds should be reasonable). However, if you get lots of timeouts, you should review your code and look for bottlenecks or blocking operations.

Once you have the means to create a connection (as above), just get the lazy value, and from it get a handle on one of the 16 Redis databases (by default it’s database 0 if not specified):

var db = conn.Value.GetDatabase();

I’ve seen a lot of code in the past that just calls GetDatabase() all the time, for each operation. That’s fine, because the Basic Usage documentation states that:

“The object returned from GetDatabase is a cheap pass-thru object, and does not need to be stored.”

Despite this, I see no point in having an unnecessary extra level of indirection in my code, so I like to store this and work directly with it. Your mileage may vary.

Once you’ve got hold of your Redis database, you can perform your regular Redis operations on it.

db.StringSet("x", 1);
var x = db.StringGet("x");

# A Multilevel Cache Retrieval Design for .NET

Caching data is vital for high-performance web applications. However, cache retrieval code can get messy and hard to test without the proper abstractions. In this article, we’ll start an ugly multilevel cache and progressively refine it into something maintainable and testable.

# Naïve Multilevel Cache Retrieval

A multilevel cache is just a collection of separate caches, listed in order of speed. We typically try to retrieve from the fastest cache first, and failing that, we try the second fastest; and so on.

For the example in this article we’ll use a simple two-level cache where:

We’re going to build a Web API method that retrieves a list of supported languages. We’ll prepare this data in Redis (e.g. using the command SADD languages en mt) but will leave the MemoryCache empty (so it will have to fall back to the Redis cache).

A simple implementation looks something like this:

public class LanguagesController : ApiController
{
// GET api/languages
{
// retrieve from MemoryCache

var valueObj = MemoryCache.Default.Get("languages");

if (valueObj != null)
return valueObj as List<string>;
else
{
// retrieve from Redis

var conn = await ConnectionMultiplexer.ConnectAsync("localhost:6379");
var db = conn.GetDatabase(0);
var redisSet = await db.SetMembersAsync("languages");

if (redisSet == null)
return null;
else
return redisSet.Select(item => item.ToString()).ToList();
}
}
}

Note: this is not the best way to create a Redis client connection, but is presented this way for the sake of simplicity.

# Data Access Repositories and Multilevel Cache Abstraction

The controller method in the previous section is having to deal with cache fallback logic as well as data access logic that isn’t really its job (see Single Responsibility Principle). This results in bloated controllers, especially if we add additional cache levels (e.g. fall back to database for third-level cache).

To alleviate this, the first thing we should do is move data access logic into repositories (this is called the Repository pattern). So for MemoryCache we do this:

public class MemoryCacheRepository : IMemoryCacheRepository
{
{
var valueObj = MemoryCache.Default.Get("languages");
var value = valueObj as List<string>;
}
}

…and for Redis we have this instead:

public class RedisCacheRepository : IRedisCacheRepository
{
{
var conn = await ConnectionMultiplexer.ConnectAsync("localhost:6379");
var db = conn.GetDatabase(0);
var redisSet = await db.SetMembersAsync("languages");

if (redisSet == null)
return null;
else
return redisSet.Select(item => item.ToString()).ToList();
}
}

The repositories each implement their own interfaces, to prepare for dependency injection which is one of our end goals (we’ll get to that later):

public interface IMemoryCacheRepository
{
}

public interface IRedisCacheRepository
{
}

For this simple example, the interfaces look almost identical. If your caches are going to be identical then you can take this article further and simplify things even more. However, I’m not assuming that this is true in general; you might not want to have a multilevel cache everywhere.

Let’s also add a new class to abstract the fallback logic:

public class MultiLevelCache
{
{
{

if (retrievedValue != null)
return retrievedValue;
}

return null;
}
}

Basically this allows us to pass in a number of tasks, each corresponding to a cache lookup. Whenever a cache lookup returns null, we know it’s a cache miss, which is why we need the where T : class restriction. In that case we try the next cache level, until we finally run out of options and just return null to the calling code.

This class is async-only to encourage asynchronous retrieval where possible. Synchronous retrieval can use Task.FromResult() (as the MemoryCache retrieval shown earlier does) to conform with this interface.

We can now refactor our controller method into something much simpler:

{
var memoryCacheRepository = new MemoryCacheRepository();
var redisCacheRepository = new RedisCacheRepository();
var cache = new MultiLevelCache();

var languages = await cache.GetAsync(
memoryCacheRepository.GetLanguagesAsync(),
redisCacheRepository.GetLanguagesAsync()
);

return languages;
}

The variable declarations will go away once we introduce dependency injection.

# Multilevel Cache Repository

The code looks a lot neater now, but it is still not testable. We’re still technically calling cache retrieval logic from the controller. A cache depends on external resources (e.g. databases) and also potentially on time (if expiry is used), and that’s not good for unit tests.

A cache is not very different from the more tangible data sources (such as Redis or a database). With them it shares the function of retrieving data and the nature of relying on resources external to the application, which makes it incompatible with unit testing. A multilevel cache has the additional property that it is an abstraction for the underlying data sources, and is thus itself a good candidate for the repository pattern:

We can now move all our cache retrieval logic into a new MultiLevelCacheRepository class:

public class MultiLevelCacheRepository : IMultiLevelCacheRepository
{
{
var memoryCacheRepository = new MemoryCacheRepository();
var redisCacheRepository = new RedisCacheRepository();
var cache = new MultiLevelCache();

var languages = await cache.GetAsync(
memoryCacheRepository.GetLanguagesAsync(),
redisCacheRepository.GetLanguagesAsync()
);

return languages;
}
}

Our controller now needs only talk to this repository, and carry out any necessary logic after retrieval (in this case we don’t have any):

{
var repo = new MultiLevelCacheRepository();
var languages = await repo.GetLanguagesAsync();
return languages;
}

# Dependency Injection

Our end goal is to be able to write unit tests for our controller methods. A prerequisite for that is to introduce dependency injection.

Follow the instructions in “ASP .NET Web API Dependency Injection with Ninject” to set up Ninject, or use any other dependency injection framework you prefer.

In your dependency injection configuration class (NinjectWebCommon if you’re using Ninject), set up the classes and interfaces you need:

private static void RegisterServices(IKernel kernel)
{
kernel.Bind<IMemoryCacheRepository>().To<MemoryCacheRepository>()
.InSingletonScope();
kernel.Bind<IRedisCacheRepository>().To<RedisCacheRepository>()
.InSingletonScope();
kernel.Bind<IMultiLevelCacheRepository>().To<MultiLevelCacheRepository>()
.InSingletonScope();
kernel.Bind<MultiLevelCache>().To<MultiLevelCache>()
.InSingletonScope();
}

Note: you can also set up an interface for MultiLevelCache if you want. I didn’t do that out of pure laziness.

Next, refactor MultiLevelCacheRepository to get the classes it needs via constructor injection:

public class MultiLevelCacheRepository : IMultiLevelCacheRepository
{
private IMemoryCacheRepository memoryCacheRepository;
private IRedisCacheRepository redisCacheRepository;
private MultiLevelCache cache;

public MultiLevelCacheRepository(
IMemoryCacheRepository memoryCacheRepository,
IRedisCacheRepository redisCacheRepository,
MultiLevelCache cache)
{
this.memoryCacheRepository = memoryCacheRepository;
this.redisCacheRepository = redisCacheRepository;
this.cache = cache;
}

{
var languages = await cache.GetAsync(
memoryCacheRepository.GetLanguagesAsync(),
redisCacheRepository.GetLanguagesAsync()
);

return languages;
}
}

Do the same with the controller:

public class LanguagesController : ApiController
{
private IMultiLevelCacheRepository repo;

public LanguagesController(IMultiLevelCacheRepository repo)
{
this.repo = repo;
}

// GET api/languages
{
var languages = await repo.GetLanguagesAsync();
return languages;
}
}

…and make sure it actually works:

# Unit Test

Thanks to this design, we can now write unit tests. There is not much to test for this simple example, but we can write a simple (!) test to verify that the languages are indeed retrieved and returned:

[TestMethod]
{
// arrange

var languagesList = new List<string>() { "mt", "en" };

var memCacheRepo = new Mock<MemoryCacheRepository>();
var redisRepo = new Mock<RedisCacheRepository>();
var cache = new MultiLevelCache();
var multiLevelCacheRepo = new MultiLevelCacheRepository(
memCacheRepo.Object, redisRepo.Object, cache);
var controller = new LanguagesController(multiLevelCacheRepo);

memCacheRepo.Setup(repo => repo.GetLanguagesAsync())
.ReturnsAsync(null);
redisRepo.Setup(repo => repo.GetLanguagesAsync())
.ReturnsAsync(languagesList);

// act

var languages = await controller.GetAsync();
var actualLanguages = new List<string>(languages);

// assert

CollectionAssert.AreEqual(languagesList, actualLanguages);
}

Over here we’re using Moq’s Mock objects to help us with setting up the unit test. In order for this to work, we need to make our GetLanguagesAsync() method virtual in the data repositories:

# Conclusion

Caching makes unit testing tricky. However, in this article we have seen how we can treat a cache just like any other repository and hide its retrieval implementation details in order to keep our code testable. We have also seen an abstraction for a multilevel cache, which makes cache fallback straightforward. Where cache levels are identical in terms of data, this approach can probably be simplified even further.

# Introduction

In “Getting Started with SignalR“, I provided a gentle introduction to SignalR with a few simple and practical examples. The overwhelming response showed that I’m not alone in thinking this is an awesome technology enabling real-time push notifications over the web.

Web applications often face the challenge of having to scale to handle large amounts of clients, and SignalR applications are no exception. In this example, we’ll see one way of scaling out SignalR applications using something called a backplane.

# Scaleout in SignalR

Introduction to Scaleout in SignalR” (official documentation) describes how SignalR applications can use several servers to handle increasing numbers of clients. When a server needs to push an update, it first pushes it over a message bus called a backplane. This delivers it to the other servers, which can then forward the update to their respective clients.

According to the official documentation, scaleout is supported using Azure, Redis or SQL Server as backplanes. Third-party packages exist to support other channels, such as SignalR.RabbitMq.

# Scaleout Example using Redis

Introduction to Scaleout in SignalR” (official documentation) describes how to use SignalR as a backplane. To demonstrate this, I’ll build on the Chat Example code from my “Getting Started with SignalR” article.

All we need to scaleout using Redis is install the Microsoft.AspNet.SignalR.Redis NuGet package, and then set it up in the Startup class as follows:

public void Configuration(IAppBuilder app)
{
GlobalHost.DependencyResolver.UseRedis("192.168.1.66", 6379, null, "SignalRChat");
app.MapSignalR();
}

In the code above, I am specifying the host and port of the Redis server, the password (in this case null because I don’t have one), and the name of the pub/sub channel that SignalR will use to distribute messages.

To test this, you can get a Redis server from the Redis download page. Redis releases for Windows exist and are great for testing stuff, but remember they aren’t officially supported for production environments.

Now to actually test it, I’ve set up the same scaleout-enhanced chat application on two different machines, and subscribed to the Redis pub/sub channel:

Watching the pub/sub channel reveals what SignalR is doing under the hood. There are particular messages going through when the application initializes on each machine, and you can also see the actual data messages going through. So when you write a message in the chat, you can also see it in the pub/sub channel.

But even better than that, you’ll also see it on the client (browser) that’s hooked up to the other machine:

The magic you need to appreciate here is that these aren’t two browsers connected to the same server; they are actually communicating with different servers on different machines. And despite that, the messages manage to reach all clients thanks to the backplane, which in this case is Redis.

# Caveats

So I’ve shown how it’s really easy to scale out SignalR to multiple servers: you need to install a NuGet package and add a line of code. And I’ve actually tested it on two machines.

But that’s not really scaleout. I don’t have the resources to do large-scale testing, and only intended to show how scaleout is implemented with this article. The actual benefits of scaleout depend on the application. As the official documentation warns, the addition of a backplane incurs overhead and can become a bottleneck in some scenarios. You really need to study whether your application is a good fit for this kind of scaleout before going for it.

# Setting up a Connection with StackExchange.Redis

Update 25th October 2016: Just looking to quickly set up a Redis connection? Check out the followup article. Read on for a more detailed article on the topic.

StackExchange.Redis is a pretty good .NET client for Redis. Unfortunately, it can be a little bit tricky to use, and the existing documentation is far from comprehensive.

After installing StackExchange.Redis via NuGet, a Redis connection can be obtained via a special ConnectionMultiplexer object. Working with this is already tricky in itself, and many get this wrong. For instance, check out the implementation in this answer:

public static ConnectionMultiplexer RedisConnection;
public static IDatabase RedisCacheDb;

protected void Session_Start(object sender, EventArgs e)
{
if (ConfigurationManager.ConnectionStrings["RedisCache"] != null)
{
if (RedisConnection == null || !RedisConnection.IsConnected)
{
RedisConnection = ConnectionMultiplexer.Connect(ConfigurationManager.ConnectionStrings["RedisCache"].ConnectionString);
}
RedisCacheDb = RedisConnection.GetDatabase();
}
}

As I pointed out in my question, this is a bad form of lazy initialization because it lacks thread safety: multiple threads may get through the checks and initialize multiple connections, resulting in connection leaks.

It is not hard to prove that this code is leaky in multithreaded environments. First, let’s set up the ConfigurationOptions with a client name so that we can identify connections coming from our program:

private static Lazy<ConfigurationOptions> configOptions
= new Lazy<ConfigurationOptions>(() =>
{
var configOptions = new ConfigurationOptions();
configOptions.ClientName = "LeakyRedisConnection";
configOptions.ConnectTimeout = 100000;
configOptions.SyncTimeout = 100000;
return configOptions;
});

Then, we provide a property with the faulty lazy initialization:

private static ConnectionMultiplexer conn;

private static ConnectionMultiplexer LeakyConn
{
get
{
if (conn == null || !conn.IsConnected)
conn = ConnectionMultiplexer.Connect(configOptions.Value);
return conn;
}
}

Finally, we write some code that runs some Redis stuff in parallel:

static void Main(string[] args)
{
for (int i = 0; i < 3; i++)
{
{
var db = LeakyConn.GetDatabase();
Console.WriteLine(i);

string key = "key" + i;

db.StringSet(key, i);
string value = db.StringGet(key);
}
);
}

Console.WriteLine("Done");
}

When the program does its work, even with just 3 iterations, we get a total of six connections (when normally a single ConnectionMultiplexer should have at most 2 physical connections):

Another approach from this answer is to use an exclusive lock:

private static ConnectionMultiplexer _redis;
private static readonly Object _multiplexerLock = new Object();

private void ConnectRedis()
{
try
{
_redis = ConnectionMultiplexer.Connect("...<connection string here>...");
}
catch (Exception ex)
{
//exception handling goes here
}
}

private ConnectionMultiplexer RedisMultiplexer
{
get
{
lock (_multiplexerLock)
{
if (_redis == null || !_redis.IsConnected)
{
ConnectRedis();
}
return _redis;
}
}
}

However, since Redis is often used as a cache in highly concurrent web applications, this approach essentially forces code to degrade into something sequential, and has obvious performance implications.

The correct approach to using ConnectionMultiplexer is described by this answer. It involves use of Lazy<T> for thread-safe lazy initialization (see Jon Skeet’s article on Singletons). Additionally:

• It sets “abortConnect=false”, which means if the initial connect attempt fails, the ConnectionMultiplexer will silently retry in the background rather than throw an exception.
• It does not check the IsConnected property, since ConnectionMultiplexer will automatically retry in the background if the connection is dropped.

With this info, we can now fix our code:

private static Lazy<ConfigurationOptions> configOptions
= new Lazy<ConfigurationOptions>(() =>
{
var configOptions = new ConfigurationOptions();
configOptions.ClientName = "SafeRedisConnection";
configOptions.ConnectTimeout = 100000;
configOptions.SyncTimeout = 100000;
configOptions.AbortOnConnectFail = false;
return configOptions;
});

private static Lazy<ConnectionMultiplexer> conn
= new Lazy<ConnectionMultiplexer>(
() => ConnectionMultiplexer.Connect(configOptions.Value));

private static ConnectionMultiplexer SafeConn
{
get
{
return conn.Value;
}
}

static void Main(string[] args)
{
for (int i = 0; i < 3; i++)
{
{
var db = SafeConn.GetDatabase();
Console.WriteLine(i);

string key = "key" + i;

db.StringSet(key, i);
string value = db.StringGet(key);
}
);
}

Console.WriteLine("Done");
}

If you run this, you’ll find that there are now only two physical connections generated by the application, which is normal.

# Double Buffering in Redis

This article deals with a specific situation. You have a program that is continuously building a data structure from scratch. This data structure takes long (i.e. several seconds) to build, but is continuously accessed by clients. While it is being built, clients will end up retrieving partial data. This article explains how a well-known technique from computer graphics can be used in Redis to address this problem. The full source code is available at BitBucket.

# The Problem

Like just about anything, this scenario is best demonstrated with an example. So we’ll first start with some code to set up a client connection to Redis:

private static Lazy<ConnectionMultiplexer> LazyConnection
= new Lazy<ConnectionMultiplexer>(() =>
{
var config = new ConfigurationOptions();
config.AbortOnConnectFail = false;

return ConnectionMultiplexer.Connect(config);
}
);

public static ConnectionMultiplexer Connection
{
get
{
return LazyConnection.Value;
}
}

The code above is based on the feedback I received from this StackOverflow question and is the proper way to get a connection set up using StackExchange.Redis. You’ll need to install the StackExchange.Redis package via NuGet before you can use this code.

Our actual program code is extremely simple. We establish a connection (via the property above) and then continuously call a method that populates our data structure:

static void Main(string[] args)
{
var connection = Connection;
var database = connection.GetDatabase(0);

while (true)
PopulateAwesomeCompanies(database);
}

We use this infinite loop because the nature of the data may be dynamic, for example items may be removed, and it may be hard to keep track of them. So the data structure is simply rebuilt regularly.

For this simple example, our data structure is going to be a Redis sorted set that stores the most awesome companies to work for, sorted by awesomeness. Here is the data we’re going to be using:

private static string[] companies = new string[]
{
"Apple",
"Amazon",
"GFI",
"Blizzard",
"IBM"
};

private static int[] companyScores = new int[] { 95, 15, 80, 0, 100, 56 };

The method that builds the data structure is also very simple. It starts off by erasing the existing data, and then for each company, it calculates their awesomeness (simulated by a simple delay) and adds them to the sorted set:

public static void PopulateAwesomeCompanies(IDatabase database)
{
var key = "awesomecompanies";

Console.WriteLine("Starting afresh...");

for (int i = 0; i < 6; i++)
{
Console.WriteLine("Calculating {0}", companies[i]);

// simulate expensive computation

// add company with respective score
}
}

The problem with this approach is that anytime a client accesses this data structure while it’s still being built, they will get partial data:

That sucks, because we want the data to be fresh, but we also want it to be complete when it is handed to clients.

# Double Buffering in Computer Graphics

There is a technique in computer graphics called double buffering which solves a similar problem. Here’s some background from Wikipedia:

“In computer graphics, double buffering is a technique for drawing graphics that shows no (or less) flicker, tearing, and other artifacts.

“It is difficult for a program to draw a display so that pixels do not change more than once. For instance to update a page of text it is much easier to clear the entire page and then draw the letters than to somehow erase all the pixels that are not in both the old and new letters. However, this intermediate image is seen by the user as flickering. In addition computer monitors constantly redraw the visible video page (at around 60 times a second), so even a perfect update may be visible momentarily as a horizontal divider between the “new” image and the un-redrawn “old” image, known as tearing.

“A software implementation of double buffering has all drawing operations store their results in some region of system RAM; any such region is often called a “back buffer”. When all drawing operations are considered complete, the whole region (or only the changed portion) is copied into the video RAM (the “front buffer”); this copying is usually synchronized with the monitor’s raster beam in order to avoid tearing. Double buffering necessarily requires more memory and CPU time than single buffering because of the system memory allocated for the back buffer, the time for the copy operation, and the time waiting for synchronization.”

In short, double buffering involves the following steps:

1. Draw the next frame (screen image) in a temporary location in memory that isn’t directly visible to the user (the back buffer)
2. Swap the front buffer (the actual image on the screen) with the back buffer

That way, the change between the old image and the new one is instantaneous and the user will never notice the difference.

# Double Buffering in Redis

We can apply a similar technique in Redis as follows:

1. Build the new sorted set using a separate, temporary key
2. Rename the temporary key to the existing key (implicitly deletes the temporary key)

This is how it works out in practice:

public static void PopulateAwesomeCompanies(IDatabase database)
{
var tempKey = "awesomecompaniestemp";
var key = "awesomecompanies";

Console.WriteLine("Starting afresh...");

for (int i = 0; i < 6; i++)
{
Console.WriteLine("Calculating {0}", companies[i]);

// simulate expensive computation

// add company with respective score
}

// replace actual data in key with fresh data from tempKey
database.KeyRename(tempKey, key);
}

Here, we aren’t deleting any data structure before we begin, as we did before. Instead, we build our fresh sorted set using the temp key, and then copy it over to the actual key (the one that clients will be requesting). We do this using the Redis RENAME command, which copies the data from the temp key to the actual key, and implicitly also deletes the temp key. Since Redis is single-threaded, this operation is atomic and there is no race condition risk while the rename operation is happening.

The result of this is that the client will always have access to the latest complete data, even while the data structure is being rebuilt:

# On Redis Desktop Manager and Redis Keys

Update 11th August 2015: As from version 0.8.0 beta 1, Redis Desktop Manager now supports the SCAN command (rather than KEYS) for Redis 2.8 onwards. Although this limits the applicability of this article to older servers, I am more than happy that this shortcoming has been addressed. The original article below remains both for historical purposes and to warn developers of incorrect use of the KEYS command.

There’s a tool called Redis Desktop Manager which can sometimes be useful to inspect the keys in a Redis database. Indeed one of its features is that it presents a treeview showing a structured representation of the keys in the database:

But how is this treeview built? That’s easy to find out, by using the Redis MONITOR command to see the incoming commands:

The first two commands are executed when Redis Desktop Manager connects to the Redis server, and the other two are executed when the database is expanded in the treeview, revealing the keys in that database.

You’ll notice that the last command is a KEYS command, which with its wildcard argument (*) is effectively retrieving every key in the database. We can see an example of what this gives us by running the KEYS command ourselves:

Now, in this case I only have a handful of keys in my Redis database, but it’s pretty normal for real Redis databases to have very large numbers of keys. Retrieving all that data places a large burden on the Redis server, which due to its single-threaded nature will not be able to serve other requests while it is stuck retrieving every key in the database.

In fact, the KEYS command documentation particularly warns against its use in production environments:

“Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don’t use KEYS in your regular application code. If you’re looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.”

I understand why Redis Desktop Manager uses the KEYS command: it needs to retrieve all the keys in the database in order to determine how the tree structure will be displayed (since each delimited part of the key is rendered as a node). That’s the whole point of having a treeview.

However, what seems to be a useful feature can actually be very dangerous, especially when used on production servers. So do take care when using Redis Desktop Manager in such environments.

My recommendation to developers using Redis is to keep good documentation of your keys, so you won’t need any Redis command to tell you what keys are in the database. That’s not what Redis is for.

But if you do ever need to inspect your Redis keys on the server, at least follow the advice in the documentation and use SCAN instead. While this may still be expensive to retrieve the entire set of keys, it can be done in small batches, thus allowing other requests to be serviced in between iterations.

# A Redis Analogy in .NET

In this article, we’re going to create a simple Console Application that feels a little bit like Redis. The intention is simply to illustrate what you can do with Redis, not to create anything serious. For this reason, the code will be greatly simplified: we’ll have only a subset of Redis commands, no multithreading, no client/server code, (almost) no parameter validation, and no architecture whatsoever.

# Introduction

Redis is a key-value store. You can think of it as a dictionary. In its simplest sense, it could be represented like this:

var store = new Dictionary<string, string>();

However, values in Redis can be more than just strings. In fact, Redis currently supports five main data types:

• Strings
• Lists
• Hashes
• Sets
• Sorted Sets

Since in .NET these data types don’t share a common interface, we will be representing our values as plain objects:

var store = new Dictionary<string, object>();

This will result in some unfortunate type checking code, but this is necessary in this case. Maintaining a separate dictionary per type is not feasible since keys must be globally unique (i.e. you can’t use the same key for a list and a hash).

Note also how the use of a dictionary as an in-memory store means we aren’t persisting anything to disk. This might seem to be an oversimplification, but Redis’ default configuration is actually to store data only in memory. It does support persistence if you need it, but we won’t go there.

Redis supports a simple protocol which feels a bit like issuing commands in a command line if you use something like IMAPTalk. Thus, in our Console Application, we’ll use the following code to simulate Redis’ command processing logic:

string input = string.Empty;

while (true)
{

if (!string.IsNullOrEmpty(input))
{
var tokens = input.Split(); // split the input into words or tokens

if (tokens.Length >= 1)
{
string command = tokens.First().ToLowerInvariant();

try
{
switch(command)
{
// TODO command handling code goes here
default:
Console.WriteLine("Unrecognized command.");
break;
}
}
catch (Exception)
{
Console.WriteLine("Error!");
}
}
}
else
Console.WriteLine("Invalid command.");
}

The above code just accepts input, checks what the command is, and takes action accordingly. In the next sections, we’ll replace that TODO with code to actually handle some of the commands.

# Strings

The simplest data type in Redis is the string. You can easily assign a value to a key using the SET command, and then retrieve that value using the GET command, as shown above. The SET command is just a matter of assigning the value in the dictionary:

case "set":
{
string key = tokens[1];
string value = tokens[2];
store[key] = value;
Console.WriteLine("Done.");
}
break;

The GET command similarly involves basic retrieval from the dictionary, but we also need to cater for non-existent keys, and values that are not strings:

case "get":
{
string key = tokens[1];
if (store.ContainsKey(key))
{
var value = store[key] as string;
if (value != null)
Console.WriteLine(value);
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

We can also support key removal using the DEL command. Again, this is a simple operation against the dictionary:

case "del":
{
string key = tokens[1];
store.Remove(key);
Console.WriteLine("Done.");
}
break;

We can now test our three basic string commands:

# Lists

Redis also supports Lists. You can use the LPUSH and RPUSH commands to add an item to the beginning or end of a list respectively – allowing you to easily use the list as a stack or a queue as well. These operations also automatically create the list if it doesn’t already exist.

Here’s how we can implement LPUSH:

case "lpush":
{
string key = tokens[1];
string value = tokens[2];
List<string> list = null;

// create/get list

if (!store.ContainsKey(key))
{
list = new List<string>();
store[key] = list;
}
else
list = store[key] as List<string>;

// insert new value in list

if (list != null)
{
list.Insert(0, value);
Console.WriteLine("Done");
}
else
Console.WriteLine("Invalid data type!");
}
break;

The first thing we do here is retrieve the list we’re going to work with. If it doesn’t exist, we create it. We can then proceed to insert the new item into the list – in this case using List<T>.Insert() to put the item at the beginning of the list.

Note that in Redis, LPUSH actually supports insertion of multiple items with the same command. I’m not doing that here to keep things as simple as possible.

RPUSH is pretty much the same thing, except that you call List<T>.Add() instead of .Insert(), so that the new item ends up at the end of the list.

To remove items from the list, we’ll implement LPOP and RPOP, which remove items from the beginning and end of the list respectively. Here’s LPOP:

case "lpop":
{
string key = tokens[1];

if (store.ContainsKey(key))
{
var list = store[key] as List<string>;
if (list != null)
{
if (list.Count > 0)
{
Console.WriteLine(list.First());
list.RemoveAt(0);
}
else
Console.WriteLine("Empty!");
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

Aside from all the validity checks, all we’re doing here is returning the first item in the list, and then removing it. If there are no items in the list, we simply return “Empty!”.

For RPOP, the only difference is that we retrieve and remove the last item instead:

Console.WriteLine(list.Last());
list.RemoveAt(list.Count - 1);

Finally, we need something that can retrieve items in the list (without removing them). For this we have LRANGE, which retrieves a specified range of items (e.g. item #2 till item #4) from the beginning of the list. Indices may be out of range without causing errors; the indices that actually exist will be returned.

Here is the implementation for LRANGE:

case "lrange":
{
string key = tokens[1];
int start = Convert.ToInt32(tokens[2]);
int stop = Convert.ToInt32(tokens[3]);

if (start > stop)
Console.WriteLine("Empty!");
else if (store.ContainsKey(key))
{
var list = store[key] as List<string>;
if (list != null)
{
if (start < 0)
start = 0;

if (stop > list.Count - 1)
stop = list.Count - 1;

var items = list.GetRange(start, stop - start + 1);
if (items.Any())
{
foreach(var item in items)
Console.WriteLine(item);
}
else
Console.WriteLine("Empty!");
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

We can now test our list functionality:

# Hashes

Hashes are like dictionaries in themselves. Rather than mapping a value directly to a key, hashes map values onto a field of a key. This allows you to represent objects with a number of attributes (e.g. a customer having separate values for Name, Age, etc).

With HSET, we can assign a value to a key-field, as shown in the screenshot above. Like with Lists, this operation creates the Hash if it doesn’t already exist. Here’s the HSET implementation:

case "hset":
{
string key = tokens[1];
string field = tokens[2];
string value = tokens[3];
Dictionary<string, string> hash = null;

// create/get hash

if (!store.ContainsKey(key))
{
hash = new Dictionary<string, string>();
store[key] = hash;
}
else
hash = store[key] as Dictionary<string, string>;

// set field in hash

if (hash != null)
{
hash[field] = value;
Console.WriteLine("Done");
}
else
Console.WriteLine("Invalid data type!");
}
break;

The code is very similar to that of LPUSH, with the difference that the key now maps to a dictionary (the hash), and the value is assigned to a field on that hash. Think of it as: key -> field -> value.

After using HSET to set a value, we can retrieve it with HGET:

case "hget":
{
string key = tokens[1];
string field = tokens[2];

if (store.ContainsKey(key))
{
var hash = store[key] as Dictionary<string, string>;

if (hash != null)
{
if (hash.ContainsKey(field))
Console.WriteLine(hash[field]);
else
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

HKEYS can be used to retrieve all fields of a hash, given the key:

case "hkeys":
{
string key = tokens[1];

if (store.ContainsKey(key))
{
var hash = store[key] as Dictionary<string, string>;

if (hash != null)
{
foreach (var field in hash.Keys)
Console.WriteLine(field);
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

HGETALL, on the other hand, gets all fields of a hash and their corresponding values:

case "hgetall":
{
string key = tokens[1];

if (store.ContainsKey(key))
{
var hash = store[key] as Dictionary<string, string>;

if (hash != null)
{
foreach (var kvp in hash)
{
Console.WriteLine(kvp.Key);
Console.WriteLine(kvp.Value);
}
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

Let’s test out our hash functionality:

# Sets

Sets in Redis are the mathematical kind: all items in a set are distinct, and multiple sets may be combined via standard set operations (union, intersection and difference).

Use SADD to add one or more items to a set. If the set does not exist, it will be created automatically. Here’s the SADD implementation:

{
string key = tokens[1];
var members = tokens.Skip(2); // sadd key member [member ...]
HashSet<string> set = null;

// create/get set

if (!store.ContainsKey(key))
{
set = new HashSet<string>();
store[key] = set;
}
else
set = store[key] as HashSet<string>;

if (set != null)
{
foreach (var member in members)

Console.WriteLine("Done");
}
else
Console.WriteLine("Invalid data type!");
}
break;

SMEMBERS gives you all the members of the set:

case "smembers":
{
string key = tokens[1];

if (store.ContainsKey(key))
{
var set = store[key] as HashSet<string>;

if (set != null)
{
foreach (var member in set)
Console.WriteLine(member);
}
else
Console.WriteLine("Invalid data type!");
}
else
}
break;

Standard set operations (union, intersection and difference) require two sets. If either doesn’t exist, these operations will return an empty set rather than giving any kind of error. In Redis, these operations may work on multiple sets at once; but in these examples we’re going to limit the scenario to two sets at a time for the sake of simplicity.

Set union (SUNION) returns all distinct items in the specified sets:

case "sunion":
{
List<HashSet<string>> sets = new List<HashSet<string>>();

var key1 = tokens[1];
var key2 = tokens[2];

// let's assume the keys exist and are the correct type

var set1 = store[key1] as HashSet<string>;
var set2 = store[key2] as HashSet<string>;

// get union

var union = set1.Union(set2);
foreach(var member in union)
Console.WriteLine(member);
}
break;

Set intersection (SINTER) returns those items which are common to both sets:

case "sinter":
{
List<HashSet<string>> sets = new List<HashSet<string>>();

var key1 = tokens[1];
var key2 = tokens[2];

// let's assume the keys exist and are the correct type

var set1 = store[key1] as HashSet<string>;
var set2 = store[key2] as HashSet<string>;

// get intersection

var union = set1.Intersect(set2);
foreach (var member in union)
Console.WriteLine(member);
}
break;

Finally, set difference (SDIFF) returns those items which are in the first set but which are not in the second:

case "sdiff":
{
List<HashSet<string>> sets = new List<HashSet<string>>();

var key1 = tokens[1];
var key2 = tokens[2];

// let's assume the keys exist and are the correct type

var set1 = store[key1] as HashSet<string>;
var set2 = store[key2] as HashSet<string>;

// get intersection

var union = set1.Except(set2);
foreach (var member in union)
Console.WriteLine(member);
}
break;

Let’s test that out:

# Sorted Sets

Sorted Sets in Redis are similar to sets, but their members are sorted by a score. This may be used for any kind of ordering from the latest 5 comments to the top 5 posts with most likes.

We can represent a Sorted Set similarly to a hash, using the scheme key -> score -> value. However, there are two main differences from a hash:

• For each score, there may be multiple values; these are sorted lexicographically.
• The scores are ordered (sorted).

The implementation of a Sorted Set in .NET is not as trivial as the other data types, and requires a combination of collections. While a simple representation could be made by combining a SortedDictionary (score -> values) with a SortedSet (distinct and sorted values), this would not allow all Sorted Set commands to be supported (e.g. ZRANK can find the index of a given value, requiring a reverse lookup).

Sorted Sets are thus beyond the scope of this article, which is intended to provide a simple mapping between Redis data types and .NET collections.

# Summary

This article explained how the Redis data types work, and showed how some of their operations may be implemented using standard .NET collections. A typical mapping could be:

• Strings – strings
• Lists – Lists
• Hashes – Dictionaries or ConcurrentDictionaries
• Sets – HashSets
• Sorted Sets – a custom data structure