Introduction to Using MongoDB with Spatial Data

This tutorial was originally created for Spatial Database Design in Temple University’s Professional Science Masters in GIS. Team members included Claude Schrader, Josh Beauchamp, Anna Wegbreit, Terra Luke, and a few others.

Intro

​ NoSQL databases are unlike many traditional SQL databases. In this tutorial we will review the differences between NoSQL and relational databases such as PostgreSQL, the strengths and weaknesses of a NoSQL database, and the reasons you might choose to use or avoid a NoSQL database. We will also demonstrate executing both spatial and non-spatial queries in MongoDB.

​ A NoSQL database does not necessarily completely replace a traditional relational database. However, for many scenarios, MongoDB may be sufficient on its own.

​ The main difference is that NoSQL doesn’t rely on relational database structures. There are many types of NoSQL database structures including key-value, wide-column, graph, and document based storage. In this tutorial, we will be using MongoDB, a document-based NoSQL database.

Strengths of MongoDB

  • No defined schema - individual records are not required to have the same structure.
  • Flexible - no need to include fields that will be largely unused for the benefit of a few records.
  • Scalable - the database can be broken across several servers both vertically and horizontally.
  • Open Source

Weaknesses of MongoDB

  • No concept of relationships between documents. This is especially problematic as the data becomes more complex.
  • Requires multiple queries to retrieve records from different collections.
  • Writing related data can require updating multiple collections.
  • Lack of schema means that defining the structure of data can be difficult.
  • The query syntax can be intimidating.

What is so different about MongoDB

MongoDB vs excels at different problems than a relational database.

Mongo is a great place to store documents, but if the relationships between those documents is important, it can get in your way very quickly.1

Another helpful way to think about whether or not your data will be a good use-case for Mongo was explained in an article by Sarah Mei. Mei describes how her team originally thought their data would work in Mongo, but after dealing with several “messy” joining issues they realized why they were having these problems. She explains the situation as such:

​ It was a sign that our data was actually relational, that there was value to that structure, and that we were going against the basic concept of a document data store.

​ Whether you’re duplicating critical data (ugh), or using references and doing joins in your application code (double ugh), when you have links between documents, you’ve outgrown MongoDB. When the MongoDB folks say “documents,” in many ways, they mean things you can print out on a piece of paper and hold. A document may have internal structure — headings and subheadings and paragraphs and footers — but it doesn’t link to other documents. It’s a self-contained piece of semi-structured data.

​ If your data looks like that, you’ve got documents. Congratulations! It’s a good use case for Mongo. But if there’s value in the links between documents, then you don’t actually have documents. MongoDB is not the right solution for you.2

Terminology

Relational Database MongoDB
Database Database
Table Collection
Row Document
Column Field
Table Join Embedded Documents, $lookup
Primary Key Primary Key (Default key_id provided)

Data Syntax and Javascript Structure

Remember that Javascript is CASE SENSITIVE

  • MongoDB uses server-side JS to interact with the database.
  • Define variable with var to use in other calculations, operations or functions

    ​ {} Denotes object literals: names and values of an object

    var car = { myCar: 'Saturn', getCar: carTypes('Honda'), special: sales }
    

    ​ [] Denotes array literal: list of elements (useful for coordinate pairs):

    var coffees = ['French Roast', 'Colombian', 'Kona'];
    
  • All objects in MongoDB have a key:value relationship except objectID

  • Values can be nested objects

  • All string data is “string” “literal” (i.e., must appear in quotes)

  • Commands are strung together with dot (.) notation:

    db.neighborhoods.find( {name: /East/}, {name:1}).sort( {name:-1})
    

Installing and using MongoDB for Spatial and Non-Spatial Data

Installing MongoDB in OSGeoLive

This tutorial explains installing and using MongoDB in an OSGeoLive virtual machine running in VirtualBox. This assumes you have already installed and configured OSGeoLive.

The install is different depending on whether you are running Version 11 or 12 of OSGeo live. Version 12 requires the installation of a few additional tools than Version 11.

  1. Use the Synaptic Package Manager located in System Tools menu to install the mongodb packages

  2. For Version 11, install the following packages:
    Mongodb  
    Mongodb-clients  
    Mongodb-server  
    Mongodb-server-core
    
  3. For Version 12 also install:

    Mongodb-tools
    
  4. To verify the installation, open LXTerminal and type
    user@osgeolive:~$ systemctl status mongodb
    

Congratulations, you have successfully installed MongoDB. You are now ready to load data and learn the basics of this NoSQL spatial database.

Verify Config Files

The default configuration of MongoDB should be fine for this intro, but if you wish to read and understand it, enter the following at a terminal window:

user@osgeolive:~$ nano /etc/mongodb.conf

Which version of mongo are you running?

If you have already installed MongoDB but are now unsure which version you are running you can check by entering

user@osgeolive:~$ apt-cache show mongodb

Your terminal should show either of the following outcomes:

Installed Version for OsGeo-Live 11

user@osgeolive:~$ apt-cache show mongodb | grep Version  
Version: 1:2.6.10-0ubuntu1

Installed Version for OsGeo-Live 12

user@osgeolive:~$ apt-cache show mongodb | grep Version  
Version: 1:3.6.3-0ubuntu1	

Downloading & Importing Data

You will download data to MongoDB outside of the MongoDB shell. We are using the restaurants and neighborhoods json files linked from MongoDB’s geospatial tutorial

  1. Find your Current Directory:
    user@osgeolive:~$ pwd  
    /home/user
    
  2. Copy the URL of the dataset you are working with.

  3. Download data with the wget command. These two commands download the data to your current directory:

    user@osgeolive:~$ wget https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/neighborhoods.json 
    
    user@osgeolive:~$ wget https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/restaurants.json    
    
  4. Import files: mongoimport ( file location )

    user@osgeolive:~$ mongoimport neighborhoods.json -c neighborhoods
    
    user@osgeolive:~$ mongoimport restaurants.json -c restaurants
    
  5. View your collections in the Mongo shell:

    user@osgeolive:~$ mongo
    
    > show collections
    neighborhoods
    restaurants
    

    Note: To exit the Mongo shell type exit or enter CTRL + D

Basic Queries

SQL vs MongoDB

SQL MongoDB
SELECT * db.restaurants.find({ })
WHERE db.restaurants.find({"name": "The Movable Feast"})
WILDCARD '% %' db.restaurants.find( {"name": /Movable/ } )

How to view all data in a collection:

Equivalent to SELECT * FROM restaurants in SQL

db.restaurants.find({ })

Note: It is not recommended to run this for the neighborhoods collection. Because it is a large dataset, it will return an enormous amount of data to your terminal.

How to view data based on a condition:

Equivalent to WHERE clause in SQL. Note: This must be an exact match.

db.restaurants.find({"name": "The Movable Feast"})

How to view data based on partial condition:

Equivalent to wildcards in SQL such as '% Movable %'

db.restaurants.find( {"name": /Movable/ } )

Additional Queries

Wildcard WHERE clause with specific fields:

db.neighborhoods.find( {"name": /East/}, {"name":1})

Same query as above, but ordered alphabetically:

db.neighborhoods.find( {"name": /East/}, {"name":1}).sort( {"name":1})

Again, now in descending order:

db.neighborhoods.find( {"name": /East/}, {"name":1}).sort( {"name":-1})

Additional Mongo Commands:

Logical Operators: $and, $not, $nor, $or, $orderby

  • These logical operators work the same way SQL logical operators work but in a very bracket heavy syntax.

Help: db.help() - Gives you a command list

Spatial Queries

Geospatial Data and Indexing in MongoDB

MongoDB can store data as GeoJSON objects or legacy coordinate pairs. Regardless, location data MUST be specified in longitude, latitude format. Geospatial queries on GeoJSON objects in MongoDB are calculated on a sphere using WGS 84. The recommendation is to create a 2d spherical index to enable a greater variety of geospatial queries, although other indexes are possible.

The command to create a 2d sphere index is as follows, using the field in the data that specifies the location data either as a GeoJSON object or a legacy coordinate pair: (with the name of the collection after the first dot)

db.collection.createIndex( { <location field> : "2dsphere" } )

Spatial indexes for the restaurant and neighborhood collections are created as follows, specifying the field in the document that stores location information.

db.restaurants.createIndex({ location: "2dsphere" })

db.neighborhoods.createIndex({ geometry: "2dsphere" })

Restaurant Exercise

Scenario: We’re dropped off by a spaceship at an unknown point in New York City. We only know our coordinates. And We’re hungry!

Can we discover what neighborhood we are in based on our coordinates?

db.neighborhoods.findOne({ geometry: { $geoIntersects: { $geometry: { type: "Point", coordinates: [ -73.8803827, 40.7643124] } } } })

Turns out we’re in Jackson Heights, Queens.

Now, what are all the restaurants within a quarter mile of our location? We only need the names.

db.restaurants.find({ location: { $geoWithin: { $centerSphere: [[ -73.8803827, 40.7643124 ], .25 / 3963.2 ] } } } , {"name":1})

That’s too many choices. I’ve heard a lot about nyc delis. I’d love to eat at a deli close to here. Using an “and” function here.

db.restaurants.find({ $and: [{location: { $geoWithin: { $centerSphere: [[ -73.8803827, 40.7643124 ], .25 / 3963.2 ] } } }, {"name": /Deli/}] } )

So it looks like we’re eating at the Airport Deli in Queens!

Final Notes about Spatial Queries and Spatial Data

  • Javascript takes lat/long as long/lat, which is a notable distinction from many other formats.

  • Acceptable ranges:
    • Longitude :180, 180 (inclusive)
    • Latitude -90, 90 (inclusive)
  • The aforementioned operations use radians for distance. Other spherical query operators, such as $geoWithin, do not. 3
  • For spherical query operators to function properly, you must convert distances to radians, and convert from radians to the distances units used by your application.
    • To convert distance to radians: Divide the distance by the radius of the sphere (e.g. the Earth) in the same units as the distance measurement.
    • To convert radians to distance: Multiply the radian measure by the radius of the sphere (e.g. the Earth) in the coordinates system to which you want to convert the distance.
  • The equatorial radius of the Earth is approximately 3,963.2 miles or 6,378.1 kilometers 4

Appendix 1: Basic Linux Commands

This tutorial was done in the Linux Terminal, a place that many beginners may not be familiar with. Here are a few basic commands to understand when operating through Linux Terminal.

  • ls - gives the user a list of files/folders in the current directory
  • grep - a very powerful, but complicated search tool
  • pwd - tells you your current working directory
  • cd - changes the current working directory of the user
  • ctrl-c - with some exceptions, this exits or cancels the current program or command
  • mv - move a file from one directory to another (in case you import files to the wrong place!). This is also used to rename files.
  • sudo - elevates the privileges of the user to admin level. Be very careful with this command.

Appendix 2: Further Reading

In researching and developing this tutorial we found several resources that were incredibly helpful in setting up MongoDB. If you are interested in learning more about MongoDB and NoSQL the following resources are highly recommended:

https://docs.mongodb.com/manual/reference/sql-comparison/ - A super useful resource! This documentation shows equivalent MongoDB syntax to SQL statements, including LIKE and comparison operators.

https://www.slideshare.net/mongodb/getting-started-with-geospatial-data-in-mongodb

https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/ - Another great restaurant tutorial

https://datascienceplus.com/using-mongodb-with-r - We did not discuss using MongoDB with R but it seems like a very powerful combination of tools.

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ - This is actually about an instance where Mongo didn’t work out for one particular project. It’s a good use-case for a scenario when a traditional relational database may be preferable.