MongoDB for Developers

Copyright Notice

This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.

title: MongoDB for Developers
author: Kamil Baran

Introduction

MongoDB is a fast document-oriented database
replaces the concept of a "row" with a more flexible model - the "document"
convenient data storage for modern object-oriented languages
no predefined schemas
no transactions
no SQL
supports indexes
easy to scale out horizontally

Getting Started

document is a basic unit of data (like row in RDBMS)
collection = table
multiple databases in single instance
_id in every document, unique within collection
JavaScript shell for administration and data manipulation

Documents

document is an ordered set of keys with associated values
representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
objects in JavaScript {key:value}
example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
key is a string, any UTF-8 is allowed (except \0 . $)
type-sensitive {"age" : 3}, {"age" : "3"}
case-sensitive {"age" : 3}, {"Age" : 3}
documents cannot contain duplicated keys
key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
- order does not usually matter, MongoDB can reorder keys

{
  "_id" : ObjectId("545a414c7907b2a255b156c5"),
  "Name" : "Sean Connery",
  "Nationality" : "Great Britain",
  "BirthDate" : ISODate("1930-08-25T00:00:00Z"),
  "BirthYear" : 1930,
  "Occupation" : [
    "Actor",
    "Director",
    "Producer"
  ],
  "Movie" : [
    {
      "_id" : ObjectId("545a5f167907b2a255b156c7"),
      "Title" : "Dr. No"
    },
    {
      "_id" : ObjectId("545a5f317907b2a255b156c8"),
      "Title" : "From Russia with Love"
    },
    {
      "_id" : ObjectId("545a5ed67907b2a255b156c6"),
      "Title" : "Never Say Never Again"
    }
  ],
  "BirthPlace" : {
    "Country" : "United Kingdom, Scotland",
    "City" : "Edinburgh"
  }
}

Collections

a group of documents
dynamic schemas
- {"company" : "NobleProg"}
- {"age" : 5}
why should we use more than one collection?
- nightmare for developers
- much faster to get a list of collections than extracting document types from collections
- grouping documents of the same kind
- indexes
collection name is a string, any UTF-8 is allowed except:
- empty string, start with "system." prefix, contain \0 character, $ character
subcollections separated by the . character
- example GridFS (fs.files, fs.chunks)

Databases

database is a group of collections
one database = one application
separated databases for different applications, users
database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
database name will end up as file on filesystem (this explains restrictions)
special databases: admin (root database), local (never replicated), config (when sharding)
namespace is a concatenation of database and collection name (fully qualified collection name)
- max 121 bytes, shold be less than 100

Getting and Starting MongoDB

Installation on Windows
- http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
Installation on Ubuntu
- http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

CRUD

Create

use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person) // .insert() is depricated now, use .insertOne() instead

Read

db.people.find()
db.people.findOne()
db.people.find().pretty() // .pretty() is the default now, so can be omitted

Update

personUp = {"Occupation" : "Actor"}
db.people.update({"Name" : "Sean Connery"}, {$set: personUp}) // .update() is depricated now, use .updateOne() instead
db.people.findOne()

Delete

db.people.remove({"Name" : "Sean Connery"}) // .remove() is depricated now, use .deleteOne() instead
db.people.remove({})
db.people.findOne()

Data Types

JSON-like documents
- 6 data types: null, boolean, numeric, string, array, object
MongoDB adds support for other datatypes
- null {"x" : null}
- boolean {"x" : true}
- number (by default 64-bit floating point numbers) {"x" : 1.4142}
  - 4-byte integers {"x" : NumberInt(141)}
  - 8-byte integers {"x" : LongInt(141)}
- string (any UTF-8 character) {"x" : "NobleProg"}
- date (stored as milliseconds from Linux epoch) {"x" : new Date()}
- regular expressions (in queries) {"x" : /bob/i}
- array {"x" : [1.4142, true, "training"]}
- embedded documents {"x" : {"y" : 100}}
- ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
- binary data (for non-UTF-8 strings)
- code {"x" : function() {/*...*/}}

_id and ObjectId

every document in MongoDB must have _id
can be any type but it defaults to ObjectId
unique values in a single collection
ObjectId is designed to lightweight and easy to generate
- 12-bytes of storage (24 hexidecimal digits)
  - timestamp - byte 0-3
  - machine - byte 4-6
  - PID - byte 7-8
  - increment - byte 9-11
- _id is generated automatically if not present in document

MongoDB Shell

$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>

$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test

$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg

Using help

mongo is a JavaScript shell, help is available in JavaScript on-line documentation
use built-in help for MongoDB-specific functionality
type function name without parentheses to see what the function is doing

> help
    db.help()            help on db methods
    db.mycoll.help()     help on collections methods
    ...
    exit                 quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
    return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>

Running Scripts

mongo can execute JavaScript files
scripts have access to all global variables (e.g. "db")
shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
use --quiet to hide "MongoDB shell version..." when executing script
use load() to run script directly from Mongo Shell
use .mongorc.js for frequently-loaded scripts
- located in user home directory, run when starting up the shell, --norc to disable it
- useful also to customize prompt

$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>

Editing Complex Variables

limited multiline support in the shell
external editors are allowed
EDITOR="/usr/bin/gedit"
EDITOR="c:\\windows\\notepad.exe"

$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)

Querying ⌘

Select ⌘

find, findOne
parameters: query, fields, limit, skip, sort, batchSize, options

> db.people.find()
> db.people.find({"Nationality" : "Great Britain"})
> db.people.find({"Nationality" : "Great Britain", "occupation" : "actor"})
> db.people.find({"_id" : ObjectId("545a36577907b2a255b156c4")})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1, "_id" : 0})
> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})

Query Criteria ⌘

$lt, $lte, $gt, $gte, $eq, $ne
$in, $nin, $or (whenever possible use $in instead of $or)
$not, $mod
handling null value, $exists
Perl Compatible Regular Expressions are allowed

> db.people.find({Nationality : "USA", BirthYear : {$lte : 1950}})
> db.people.find({Nationality : "USA", BirthYear : {$gte : 1940, $lte : 1950}})
> db.people.find({Nationality : {$in : ["Belgium", "Israel"]}})
> db.people.find({$or : [{Nationality : "Belgium"}, {Nationality : "Israel"}]})
> db.people.find({BirthYear : {$mod : [10, 0]}})
> db.people.find({BirthYear : {$not : {$mod : [10, 0]}}})
> db.people.find({Nationality : null})
> db.people.find({Nationality : {$exists : 1, $eq : null}})
> db.people.find({Name : /east/i})
> db.people.find({Name : /^A/})
> db.people.find({Occupation : /^actor$/i}, {Occupation:1, Name:1, _id:0})
> db.people.find({Name : /^cli.*.wood$/i}, {Name:1, _id:0})
> db.people.find({Name : /^clint eastw..d$/i}, {Name:1, _id:0})

Query Criteria for Arrays ⌘

behave in the same way as scalars
$all, $size (can not be combined with other criterias)
$slice, $-operator, * $elemMatch

> db.people.find({Occupation : "Actor"}, {Name : 1, Occupation : 1})
> db.people.find({Occupation : ["Actress"]})
> db.people.find({Occupation : {$all : ["Actress", "Producer"]}})
> db.people.find({"Occupation.1" : "Producer"})
> db.people.find({Occupation : {$size : 3}})
> db.people.find({Name: "Sean Connery"}, {Movie : {$slice : -1}})
> db.people.find({"Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie.Title" : "Never Say Never Again"}, {"Movie.$" : 1})

Query on Embedded Document ⌘

query for entire document
query for selected key/value pairs
$elemMatch
watch out for the dots (.) in keys names

> db.people.findOne({"BirthPlace" : {"Country" : "United Kingdom, Scotland", "City" : "Edinburgh"}})
> db.people.findOne({"BirthPlace" : {"City" : "Edinburgh", "Country" : "United Kingdom, Scotland"}})
> db.people.findOne({"BirthPlace.City" : "Edinburgh", "BirthPlace.Country" : "United Kingdom, Scotland"})
>
> db.people.find({"Movie._id" : ObjectId("545a5f167907b2a255b156c7"), "Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie" : {"$elemMatch" : {"_id" : ObjectId("545a5f167907b2a255b156c7"), "Title" : "Never Say Never Again"}}})

Using $where ⌘

$where allows to run arbitrary JavaScript as part of a query
powerful, slow, only for trusted users, can not use indexes

> db.test_where.insert([{"ordered" : 10, "sent" : 0}, {"ordered" : 10, "sent" : 5}, {"ordered" : 10, "sent" : 10}])
> db.test_where.find({"$where" : function() {
... for (var current in this) {
...   for (var other in this) {
...     if (current != other && this[current] == this[other]) {
...       return true;
...     }
...   }
... }
... return false;
... }});

Cursors ⌘

MongoDB returns results from find using a cursor
the lifetime of the cursor, immortal cursor
creating cursors in shell: var res = db.test.find()
use next to iterate through results
use hasNext to check if there is another document

> var cursor = db.people.find()
> while (cursor.hasNext()) { 
...   doc = cursor.next();
...   print(doc.Name+" was born in "+doc.BirthYear);
... }
Johnny Depp was born in 1963
Sean Connery was born in 1930

there is forEach as an alternative for while

> var cursor = db.people.find()
> cursor.forEach(function(doc) {
...   print(doc.Name+" was born in "+doc.BirthYear);
... })
Johnny Depp was born in 1963
Sean Connery was born in 1930

Query Options ⌘

.limit(), .skip(), .sort()
pagination and avoiding large skips
$maxScan, $showDiskLoc, $min, $max
.snapshot() for consistent results in large collections

> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})
> db.people.find({}, {"Nationality" : 1, "Name" : 1}).sort({"Nationality" : -1, "Name" : 1})
> db.people.find()._addSpecial("$maxScan", 5)
> db.people.find()._addSpecial("$showDiskLoc", true)
> db.people.find().min({"BirthYear" : 1975}).max({"BirthYear" : 1990})

Data Manipulation ⌘

Insert ⌘

insert is the basic method to add documents
_id will be created automatically if not present in document
batch insert (pass an array of documents, max 48MB at once, faster)
insert validation
- only basic structure, size (16MB), adds _id
- most drivers do additional tests (size, non-UTF-8 strings, unrecognised data types) before sending data to MongoDB
- use Object.bsonsize(doc) to see the BSON size of doc in bytes

> db.people.insert({"_id" : 123, "Name" : "Al Pacino"})
> db.people.insert({"Name" : "Sean Connery"})
> db.people.insert([{"Name" : "Meryl Streep"}, {"Name" : "Clint Eastwood"}, {"Name" : "Penélope Cruz"}])

Remove ⌘

remove expects a query document as a parameter
if query document is empty it will delete all documents from the collection but not entire collection
there is no way to recover removed documents
drop is a very fast alternative when removing all documents (require to recreate all indexes)

> db.people.stats()
> db.people.remove({"Nationality" : "Great Britain"})
> db.people.remove({})
> db.people.drop()

Update ⌘

update expects two parameters:
- a query document to locate documents to be updated
- a new document or document that describes how found documents will be changed
conflicting updates - the last update will "win"
using _id for update criteria will be faster and safer than other fields

> person = db.people.findOne({"Name" : "Sean Connery"})
> person.Occupation = "Actor"
> db.people.update({"Name" : "Sean Connery"}, person)
> db.people.findOne({"Name" : "Sean Connery"})

Update Modifiers ⌘

useful when changing only specified fields of a document
used for complex update operations (altering, adding, removing keys, manipulating arrays or embedded documents)
if key does not exist it will be created
_id field cannot be changed
$inc, $set, $unset, $push, $pull, $each, $slice, $sort, $pop, $ne, $addToSet, $ (positional array modificator)

> db.people.findOne({"Name" : "Sean Connery"})
> db.people.update({"Name" : "Sean Connery"}, {"$inc" : {"Age" : 1}})
> db.people.findOne({"Name" : "Sean Connery"})

Upsert ⌘

upsert is a special type of update
advantage: the same code for create and update
it will create a document if there is no document that matches the update criteria
new document will combine update data and criteria
$setOnInsert modifier
.save() shell helper

> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, true)
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, {upsert : true})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}, "$setOnInsert" : {"instertDate" : new Date()}}, {upsert : true})
> db.websites.find().pretty()

Updating Multiple Documents ⌘

by default only first document will be changed
default behaviour may change in the future
db.runCommand({getLastError : 1})

> db.people.insert([{"Name" : "Jack Nicholson", "Nationality" : "USA"}, {"Name" : "Bruce Willis", "Nationality" : "RFN"}, {"Name" : "Morgan Freeman", "Nationality" : "USA"}])
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}})
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, false, true)
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, {multi : true})

Two Phase Commits ⌘

http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

Indexes and Query Optimization

database index works like index in a book
indexes in MongoDB works almost identical to relational databases
because of indexes reads are faster but writes slower
64 indexes per collection

Collection Without Indexes

for (var i=1; i<=1000000; i++) {
  db.visitors.insert({
    "i" : i,
    "visitor" : "visitor_"+i,
    "score" : Math.floor(Math.random()*10+1),
    "date" : new Date()
  })
}

One Field Indexes

db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})

Compound Indexes

db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")

Compound Indexes (2)

pattern {SortKey : 1, CriteriaKey : 1}
pattern {ExactMatch : 1, RangeCriteria : 1}
sorting directions {"score" : -1, "visitor" : 1}
{"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
{key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
$-operators and indexes ($where, $exists, $nin, $ne, $not)
queries with $or can use more than one index

Indexing Arrays and SubDocuments

usually it behaves like normal index
difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
only one array field per index
multikey indexes

db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})

Indexes in Details

low and high cardinality fields
understanding .explain()
using .hint()
query optimizer (100 results, 1000 queries, index creation)
fields in index must be smaller than 1kB
options: unique, dropDups, sparse

db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})

Sparse Index

only contain entries for documents that have the indexed field (even if the index field contains a null value)
useful with unique constraint

db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})

db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})

db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})

Index Administration

system.indexes - read only collection that stores info about indexes
.createIndex()
.dropIndex(), .dropIndexes()
.getIndexes()
options: background, name

Capped Collections

it has to be created before first insert occurs
fixed size or size and number of documents (circular queue)
forbidden operations on documents: removing, updating (if it will increase the size)
can not be sharded or changed
sorting: $natural : 1 (or -1)

db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})

db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})

Tailable Cursors

inspired by the tail -f command
not closed when their results are exhausted
can be used only on capped collection
will die after 10 minutes

TTL Indexes (time-to-live)

TTL index allows you to set a timeout for each document
removing is performed every 60 secons
can be created only on single field (date field)

db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()

Full-Text Indexes

quick text search with built-in multi-language support
very expensive, especially on busy collections

db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})

db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })

Geospatial Indexes

2d index

for data stored as points on a two-dimensional plane

db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})

2dsphere index

supports queries that calculate geometries on an earth-like sphere
data stored as GeoJSON objects
operators: http://docs.mongodb.org/master/reference/operator/query-geospatial/
calculating distance: http://docs.mongodb.org/master/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/

db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})

db.cities.find({location:{$near: 
    {$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})

db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers

db.cities.find({location: {$geoWithin: 
    {$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})

Aggregation ⌘

Single Purpose Aggregation ⌘

count, distinct, group

> db.people.count()
> db.people.count({"Occupation" : "Producer"})
>
> db.people.distinct("Movie.Title")
> db.runCommand({"distinct" : "people", "key" : "Movie.Title"})
>
> db.people.group({
...   "key" : {"Nationality" : 1},
...   //"$keyf" : function(curr) { return {"Nationality" : curr.Nationality.toLowerCase()}; },
...   "initial" : {"BirthYear" : 0},
...   "reduce" : function(curr, result) {
...     if (curr.BirthYear > result.BirthYear) {
...       result.BirthYear = curr.BirthYear;
...       result.Nationality = curr.Nationality;
...     }
...   },
...   "condition" : {"Nationality" : {"$ne" : null}},
...   "finalize" : function(result) {
...     result[result.Nationality] = result.BirthYear;
...     delete result.Nationality;
...     delete result.BirthYear;
...   }
... })

Aggregation Pipelines ⌘

aggregation is a pipeline
some pipeline operators: $match, $project, $group, $sort, $limit, $skip, $unwind, $lookup, $out
$group and $sort needs to collect all documents
results were limited to 16MB, now unlimited
memory usage limited to 100MB, allowDiskUse

> db.people.aggregate([{"$project" : {"Nationality" : 1}},
...   {"$group" : {"_id" : "$Nationality", "count" : {"$sum" : 1}}},
...   {"$sort" : {"count" : -1}},
...   {"$limit" : 3}
... ])
>
> db.people.aggregate([{"$project" : {"Country" : "$Nationality"}}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" :  0}}, {"$unwind" : "$Movie"}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" :  0}}, {"$unwind" : "$Movie"}, {$out:"aggregated"}])
> db.people.aggregate([...], {{allowDiskUse: true, explain: true}})

$project Expressions ⌘

mathematical: $add, $subtract, $multiply, $divide, $mod
date: $year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second
string: $substr, $concat, $toLower, $toUpper
comparison: $cmp, $strcasecmd, $eq, $ne, $gt, $gte, $lt, $lte
logical: $and, $or, $not, $cond, $ifNull

> db.people.aggregate([{"$match" : {"BirthDate" : {"$gte" : new Date("1970-01-01")}}}, {"$project" : {"BirthDate" : {"$year" : "$BirthDate"}}}])
> db.people.aggregate([{"$project" : {"Nationality" : {"$ifNull" : ["$Nationality", "unknown"]}}}])

$group Expressions ⌘

arithmetical: $sum, $avg,
extreme: $min, $max, $first, $last
array: $addToSet, $push

> db.people.aggregate([{$group:{_id:"$Nationality", peopleCount:{$sum:1}}}])
> db.people.aggregate([{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])

Views

new in version 3.4
views are always read-only
views are based on existing collections or other views

> db.createView(<view>, <source>, <pipeline>, <collation>)
> db.createView("peopleByBirthYear", "people", [{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])

Map-Reduce ⌘

powerful and flexible tool for aggregation
MR uses JavaScript as a "query language"
not very fast
arguments: mapreduce, map, reduce, finalize, keeptemp, out, query, sort, limit, scope, verbose

> map = function () {
... for (var key in this) {
...   emit(key, {count : 1});
... }}
>
> reduce = function (key, emits) {
...   total = 0;
...   for (var i in emits) {
...     total += emits[i].count;
...   }
...   return {"count" : total};
... }
>
> db.runCommand({"mapreduce" : "people", "map" : map, "reduce" : reduce, "out" : "mapreduce_output_collection"})
> db.mapreduce_output_collection.find()

Data models ⌘

normalization vs. denormalization (referencing vs. embedding)
few things to consider:
- writes vs. reads
- immediate consistency vs. eventual consistency
- one-to-many vs. one-to-few
- document growth

Embedding is better for...	References are better for
small subdocuments	large subdocuments
data that does not change regularly	volatile data
when eventual consistency is acceptable	when immediate consistency is necessary
documents that grow by a small amount	documents that grow by a large amount
data that you'll often need to perform a second query to fetch	data that you'll often exclude from results
fast reads	fast writes

more resources:
- docs.mongodb.com/manual/core/data-modeling-introduction/
- developer.mongodb.com/article/mongodb-schema-design-best-practices/