Tag Archives: Elasticsearch

Indexing and Searching Geopolygons using ElasticSearch

ElasticSearch is great for indexing and searching text, but it also has a lot of functionality related to searching points and regions on the world map. In this article, we’ll learn how to index polygons corresponding to territories in the world, and find whether a point is in any indexed polygon.

Building Polygons with Geocoordinates

Back in school, we (hopefully) learned that a point in 2D space can be represented as an (x, y) pair of coordinates. A point in the world can similarly be identified by a (latitude, longitude) pair of geocoordinates. We can obtain geocoordinates for a location by clicking on the map in Google Maps or similar tools.

The analogy is not perfect though; geocoordinates are not linear, which is a result of the curvature of the Earth. This is not really important for us; the point is that we can represent any given point on the Earth’s surface by means of latitude and longitude.

Once we can identify points, it’s natural to extend the concept to 2D geometry. By taking several points, we can create polygons that mark the boundaries of a given territory, such as a country or state. Jeremy Hawes’ Google Maps Polygon Coordinates Tool is great for building such polygons.

Using this tool, we can very easily construct a rough polygon representing the state of Wyoming in the US. Wyoming is great to use as a simple example because it’s roughly rectangular, so we only need four points for a workable approximation.

Below the map in this polygon tool, you’ll get the coordinates of the points along with some extra JavaScript (which you could later paste directly into the code editor). In this case, we’ve got the following coordinates in (latitude, longitude) format:

45.01967,-104.04405
44.99904,-111.03084
41.011,-111.04131
41.00193,-104.03375

Once we have the points that make up the polygon, we can feed them into Elasticsearch.

Indexing Geopolygons in Elasticsearch

Before we can index anything, we need to create a mapping that defines the structure of an index, including any fields and their data types. The Mapping Geo Shapes page in the Elasticsearch documentation provides a starting point. However, the documentation is crap, and if you follow the example in the docs closely, you’ll get an error:

After a quick search, this Stack Overflow answer reveals the cause of the problem: Elasticsearch no longer likes the string data type, and expects you to use text instead. This wouldn’t have been a problem if they bothered to update their documentation once in a while. Anyhow, our mapping request for this example will be as follows:

PUT /regions
{
  "mappings": {
    "region": {
      "properties": {
        "name": {
          "type": "text"
        },
        "location": {
          "type": "geo_shape"
        }
      }
    }
  }
}

This essentially means that each region item in the regions index will have a name and a location, the latter being the polygon itself. While we will be focusing exclusively on polygons in this article, it is worth noting that the geo_shape data type supports a lot of other geometric constructs – refer to the Geo-Shape documentation for more information.

Once our mapping is in place, we can proceed to index our polygons. The Indexing Geo Shapes documentation page shows how to do this. There’s a catch though: Elasticsearch expects to receive coordinates in (longitude, latitude) format, which is is the reverse of what we’ve been using so far. We can use a simple regular expression (e.g. in Notepad++) to swap our coordinates:

(\-?\d+\.?\d*),(\-?\d+\.?\d*)
\2,\1

The first line shows the regular expression that is used to match coordinates, and the second like shows what it should be replaced by, i.e. swapped coordinates.

Let’s use the following query to try to index our Wyoming polygon:

PUT /regions/region/wyoming
{
    "name" : "Wyoming",
    "location" : {
        "type" : "polygon", 
        "coordinates" : [[ 
        [ -104.04405,45.01967 ],
        [ -111.03084,44.99904 ],
        [ -111.04131,41.011   ],
        [ -104.03375,41.00193 ]
        ]]
    }
}

This actually fails with an error:

This is because Elasticsearch expects the polygon to be closed, i.e. it must return to the starting point. Another thing to watch out for is any polygons that have self-intersections, which Elasticsearch doesn’t allow either.

We can fix our error by simply repeating the first coordinate at the end:

PUT /regions/region/wyoming
{
    "name" : "Wyoming",
    "location" : {
        "type" : "polygon", 
        "coordinates" : [[ 
        [ -104.04405,45.01967 ],
        [ -111.03084,44.99904 ],
        [ -111.04131,41.011   ],
        [ -104.03375,41.00193 ],
        [ -104.04405,45.01967 ]
        ]]
    }
}

It should work now:

Great! Our Wyoming polygon is now in Elasticsearch.

Querying Geopolygons in Elasticsearch

We can again turn to the Elasticsearch documentation for examples of how to query our geopolygon. We can do this by taking a circle with a given radius and seeing whether it intersects the polygon, as shown in Querying Geo Shapes. Don’t confuse this with the Geo Polygon Query documentation, which is actually the opposite of our situation (i.e. having a point in Elasticsearch, and providing the polygon to test against at query time).

To test this, we’ll pick a point somewhere in Wyoming. I used Google Maps to pick a point within Yellowstone National Park, which for all we know might just be where Yogi Bear lives:

Having obtained the coordinates, we can hit Elasticsearch with a query:

GET /regions/region/_search
{
  "query": {
    "geo_shape": {
      "location": { 
        "shape": { 
          "type":   "circle", 
          "radius": "25m",
          "coordinates": [ 
            -109.874838, 44.439550
          ]
        }
      }
    }
  }
}

And you’ll see that Wyoming is actually returned in the results:

You’ll also notice that Elasticsearch gave us back all the coordinate data which we don’t really care about in this case. This can be pretty inefficient if you’re using very large and detailed polygons. We can filter that out by specifying the _source property:

GET /regions/region/_search
{
  "_source": "name", 
  "query": {
    "geo_shape": {
      "location": { 
        "shape": { 
          "type":   "circle", 
          "radius": "25m",
          "coordinates": [ 
            -109.874838, 44.439550
          ]
        }
      }
    }
  }
}

The results are now nice and clean:

Next, we’ll take a point in Texas and see that we don’t get results for that:

Geopolygons with Holes

Some territories aren’t simple polygons; they contain other territories inside them, and so the polygon has a hole. Examples include:

  • Rome (Vatican City is a hole within it)
  • New South Wales (Australian Capital Territory is a hole within it)
  • South Africa (Lesotho is a hole within it)

The Indexing Geo Shapes documentation page (which we’ve referred to earlier) explains how to account for holes in polygons you index. Let’s see how this works using a practical example.

The above image shows what New South Wales, Australia looks like in Google Maps. Notice the Australian Capital Territory state inside it. Using Jeremy Hawes’ aforementioned polygon tool, we can draw a very rough polygon for New South Wales:

This gives us the following coordinates (lat, lon) for New South Wales:

-28.92704,141.04445
-33.97411,141.00841
-37.51381,149.94544
-34.98252,150.7789
-32.70393,152.18365
-28.24141,153.49901
-28.98426,148.87874 

We will also need a polygon for Australian Capital Territory. Again, this will be a really rough approximation just for the sake of example:

Our coordinates for Australian Capital Territory are:

-35.91185,149.05898
-35.36119,149.14473
-35.31932,149.40076
-35.11429,149.09984
-35.3126,148.80286
-35.71989,148.81557 

Next, we’ll index Australian Capital Territory. This is nothing new, but remember that we must take care to swap the coordinates so that become (lon, lat), and close the polygon by repeating the first coordinate pair at the end.

PUT /regions/region/act
{
    "name" : "Australian Capital Territory",
    "location" : {
        "type" : "polygon", 
        "coordinates" : [[ 
            [ 149.05898,-35.91185 ],
            [ 149.14473,-35.36119 ],
            [ 149.40076,-35.31932 ],
            [ 149.09984,-35.11429 ],
            [ 148.80286,-35.3126  ],
            [ 148.81557,-35.71989 ],
            [ 149.05898,-35.91185 ]
        ]]
    }
}

For New South Wales, we do something special: we give it two polygons.

PUT /regions/region/nsw
{
    "name" : "New South Wales",
    "location" : {
        "type" : "polygon", 
        "coordinates" : [
            [
                [ 141.04445,-28.92704 ],
                [ 141.00841,-33.97411 ],
                [ 149.94544,-37.51381 ],
                [ 150.7789, -34.98252 ],
                [ 152.18365,-32.70393 ],
                [ 153.49901,-28.24141 ],
                [ 148.87874,-28.98426 ],
                [ 141.04445,-28.92704 ]              
            ],
            [ 
                [ 149.05898,-35.91185 ],
                [ 149.14473,-35.36119 ],
                [ 149.40076,-35.31932 ],
                [ 149.09984,-35.11429 ],
                [ 148.80286,-35.3126  ],
                [ 148.81557,-35.71989 ],
                [ 149.05898,-35.91185 ]
            ]
        ]
    }
}

The first polygon is the New South Wales polygon. The second is the one for Australian Capital Territory. The way Elasticsearch interprets this is that the first polygon is the main one; all subsequent ones are holes in the main polygon.

Once this has also been indexed, we can test this. Remember to swap your coordinates – Google Maps uses (lat, lon) whereas Elasticsearch uses (lon, lat). Let’s take a point in New South Wales – somewhere in Sydney for instance:

Our point was correctly identified as being in New South Wales. Now, let’s take a point in Canberra so that we can test out Australian Capital Territory:

Elasticsearch correctly returned Australian Capital Territory in the results. What is even more significant is that it did not return New South Wales, which it would otherwise have done had we not specified the hole when we indexed it.

Summary

After a brief introduction to geocoordinates and geopolygons, we saw how we can index geopolygons in Elasticsearch and then run queries to find out in which polygon(s) a point belongs. In a slightly more advanced scenario, we saw how to deal with polygons that have holes.

Setting Up Elasticsearch on Linux Ubuntu

Elasticsearch is a lightning-fast and highly scalable search engine built on top of Apache Lucene. In this article, we’re going to see how we can quickly set it up on an Ubuntu Linux environment (using Ubuntu 16.10 here) to be able to play around with it. We do not cover configuring Elasticsearch or setting up a cluster. To set up Elasticsearch on Windows, see “Setting Up Elasticsearch and Kibana on Windows” instead.

Before we can set up Elasticsearch itself, we need Java. We can follow these instructions to set up Java on Ubuntu. Before proceeding, verify that the JAVA_HOME environment variable is set:

echo $JAVA_HOME

It is likely that you won’t see anything as a result of this command. That’s because while the Java setup instructions do set this environment variable, it does not get applied to your current session. Try opening a new terminal window or reboot the machine, and chances are that your JAVA_HOME will be set correctly. If not, you may have to set JAVA_HOME manually.

Once Java is correctly set up (complete with the JAVA_HOME environment variable), we can proceed to set up Elasticsearch. By going to the Elasticsearch downloads page, we can download (among other things) the Debian package containing Elasticsearch:

We can now install the Debian package using dpkg. At the time of writing this article, the latest version of Elasticsearch is 5.4, so after opening a Terminal window based in the Downloads folder, we can use the following command to install Elasticsearch:

sudo dpkg -i elasticsearch-5.4.0.deb

Elasticsearch is now installed, but it is not yet running! So first, we’ll enable the Elasticsearch service so that it will start automatically when the machine is rebooted:

sudo systemctl enable elasticsearch.service

We can now start the Elasticsearch service.

sudo systemctl start elasticsearch.service

The Elasticsearch HTTP endpoint will need a few seconds before it is reachable. After that, we can verify that Elasticsearch is running either by going to localhost:9200 from a web browser, or by hitting that same endpoint using curl in the command line:

curl -X GET http://localhost:9200/

In either case, you should get a response with some JSON data about the Elasticsearch instance you’re running:

We are now all set up to play around with Elasticsearch! Since we didn’t configure anything, we have a single instance with all default settings. If you’re planning to use Elasticsearch in a production environment, you will of course want to read up on configuring it properly and setting up a cluster to ensure that it can handle the use cases you need and that it can survive failure scenarios.

Setting Up Elasticsearch and Kibana on Windows

Elasticsearch is fantastic to index your data so that it can be searched by its lightning-fast search engine. With Kibana, you also get the ability to analyse and visualise that data. Both of these products are provided for free by Elastic.

Installing Java Runtime Environment

Elastic products are developed in Java, so you’ll need the Java Runtime Environment (JRE) to run them. Get the latest JRE from the relevant ugly Oracle downloads page. Either use the .exe installer, or download the .zip file and then extract the folder inside.

Either way, take note of the JRE folder location and add it as an environment variable. To do this, hit the start menu and type “environment variables“:

In the window that comes up, go on Environment Variables…:

You will now see the user and system environment variables. Hit New… under the System variables:

Name it JAVA_HOME, and in the value put in the path to the JRE folder (not its bin folder):

You can now OK out of the various dialog windows.

Setting up Elasticsearch

Go to the Elasticsearch product page, and hit Download:

In the next page, download the ZIP file:

Extract the folder in the .zip file somewhere.

You can now run elasticsearch.bat. If you get “The syntax of the command is incorrect”, you probably didn’t set the JAVA_HOME environment variable as explained in the previous section.

elasticsearch.bat

Running this command, you should see a bunch of initialisation output:

…and if you browse to localhost:9200, you should see some JSON returned:

Now that we know it’s working, we can install it as a Windows service. So press Ctrl+C to kill the instance of Elasticsearch you just ran, and instead run:

elasticsearch-service.bat install

This should install it as a service:

This installs it as Manual startup type, and does not start it. You probably want to change that to Automatic (Delayed Start), from the Services window in Microsoft Windows, and also Start it. Once you have done that, give it a few seconds to start, and then verify again that you get a response from localhost:9200.

Setting up Kibana

Next, we will set up Kibana. Grab the Windows .zip file from the Kibana downloads page:

Extract it wherever your heart desires.

Make sure Elasticsearch is running. Then, in Kibana’s bin folder, run kibana.bat:

kibana.bat

Some text will be written to the console as Kibana is initialised, and then you should be able to go to localhost:5601 and actually get a webpage:

Now we know that it works. Let’s set it up as a service. Kill the instance we just ran using Ctrl+C first.

Oh crap, Kibana does not come with a service installer! What are we gonna do?

Enter NSSM, the Non-Sucking Service Manager, which we can use to install just about any application as a Windows service, using either the command line or an interactive GUI. After downloading NSSM, we can install Kibana as a Windows service with a command like the following from NSSM’s win64 folder:

nssm install "Kibana 5.2.2" C:\[...]\kibana-5.2.2-windows-x86\bin\kibana.bat

With an elevated command prompt, we can also configure the Windows service, such as setting the startup type and the description:

nssm set "Kibana 5.2.2" Start "SERVICE_DELAYED_AUTO_START"
nssm set "Kibana 5.2.2" Description "Kibana lets you visualize your Elasticsearch data"

Finally, we start the service:

nssm start "Kibana 5.2.2"

If all goes well:

…then we can go back to localhost:5601 and verify that it’s really running.

With that, it’s all set up. All that’s needed is an index with some data that you can use Kibana to visualise, but that’s beyond the scope of this article.