MongoDB for Developers

From Training Material
Revision as of 22:45, 12 April 2021 by Kbaran (talk | contribs) (→‎Data models ⌘)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


title
MongoDB for Developers
author
Kamil Baran

Introduction

  • MongoDB is a fast document-oriented database
  • replaces the concept of a "row" with a more flexible model - the "document"
  • convenient data storage for modern object-oriented languages
  • no predefined schemas
  • no transactions
  • no SQL
  • supports indexes
  • easy to scale out horizontally


Getting Started

  • document is a basic unit of data (like row in RDBMS)
  • collection = table
  • multiple databases in single instance
  • _id in every document, unique within collection
  • JavaScript shell for administration and data manipulation


Documents

  • document is an ordered set of keys with associated values
  • representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
  • objects in JavaScript {key:value}
  • example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
  • key is a string, any UTF-8 is allowed (except \0 . $)
  • type-sensitive {"age" : 3}, {"age" : "3"}
  • case-sensitive {"age" : 3}, {"Age" : 3}
  • documents cannot contain duplicated keys
  • key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
    • order does not usually matter, MongoDB can reorder keys


{
  "_id" : ObjectId("545a414c7907b2a255b156c5"),
  "Name" : "Sean Connery",
  "Nationality" : "Great Britain",
  "BirthDate" : ISODate("1930-08-25T00:00:00Z"),
  "BirthYear" : 1930,
  "Occupation" : [
    "Actor",
    "Director",
    "Producer"
  ],
  "Movie" : [
    {
      "_id" : ObjectId("545a5f167907b2a255b156c7"),
      "Title" : "Dr. No"
    },
    {
      "_id" : ObjectId("545a5f317907b2a255b156c8"),
      "Title" : "From Russia with Love"
    },
    {
      "_id" : ObjectId("545a5ed67907b2a255b156c6"),
      "Title" : "Never Say Never Again"
    }
  ],
  "BirthPlace" : {
    "Country" : "United Kingdom, Scotland",
    "City" : "Edinburgh"
  }
}


Collections

  • a group of documents
  • dynamic schemas
    • {"company" : "NobleProg"}
    • {"age" : 5}
  • why should we use more than one collection?
    • nightmare for developers
    • much faster to get a list of collections than extracting document types from collections
    • grouping documents of the same kind
    • indexes
  • collection name is a string, any UTF-8 is allowed except:
    • empty string, start with "system." prefix, contain \0 character, $ character
  • subcollections separated by the . character
    • example GridFS (fs.files, fs.chunks)


Databases

  • database is a group of collections
  • one database = one application
  • separated databases for different applications, users
  • database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
  • database name will end up as file on filesystem (this explains restrictions)
  • special databases: admin (root database), local (never replicated), config (when sharding)
  • namespace is a concatenation of database and collection name (fully qualified collection name)
    • max 121 bytes, shold be less than 100


Getting and Starting MongoDB

  1. Installation on Windows
  2. Installation on Ubuntu


CRUD

Create

use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person) // .insert() is depricated now, use .insertOne() instead

Read

db.people.find()
db.people.findOne()
db.people.find().pretty() // .pretty() is the default now, so can be omitted

Update

personUp = {"Occupation" : "Actor"}
db.people.update({"Name" : "Sean Connery"}, {$set: personUp}) // .update() is depricated now, use .updateOne() instead
db.people.findOne()

Delete

db.people.remove({"Name" : "Sean Connery"}) // .remove() is depricated now, use .deleteOne() instead
db.people.remove({})
db.people.findOne()


Data Types

  • JSON-like documents
    • 6 data types: null, boolean, numeric, string, array, object
  • MongoDB adds support for other datatypes
    • null {"x" : null}
    • boolean {"x" : true}
    • number (by default 64-bit floating point numbers) {"x" : 1.4142}
      • 4-byte integers {"x" : NumberInt(141)}
      • 8-byte integers {"x" : LongInt(141)}
    • string (any UTF-8 character) {"x" : "NobleProg"}
    • date (stored as milliseconds from Linux epoch) {"x" : new Date()}
    • regular expressions (in queries) {"x" : /bob/i}
    • array {"x" : [1.4142, true, "training"]}
    • embedded documents {"x" : {"y" : 100}}
    • ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
    • binary data (for non-UTF-8 strings)
    • code {"x" : function() {/*...*/}}


_id and ObjectId

  • every document in MongoDB must have _id
  • can be any type but it defaults to ObjectId
  • unique values in a single collection
  • ObjectId is designed to lightweight and easy to generate
    • 12-bytes of storage (24 hexidecimal digits)
      • timestamp - byte 0-3
      • machine - byte 4-6
      • PID - byte 7-8
      • increment - byte 9-11
    • _id is generated automatically if not present in document


MongoDB Shell

$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>

$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test

$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg


Using help

  • mongo is a JavaScript shell, help is available in JavaScript on-line documentation
  • use built-in help for MongoDB-specific functionality
  • type function name without parentheses to see what the function is doing
> help
    db.help()            help on db methods
    db.mycoll.help()     help on collections methods
    ...
    exit                 quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
    return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>


Running Scripts

  • mongo can execute JavaScript files
  • scripts have access to all global variables (e.g. "db")
  • shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
  • use --quiet to hide "MongoDB shell version..." when executing script
  • use load() to run script directly from Mongo Shell
  • use .mongorc.js for frequently-loaded scripts
    • located in user home directory, run when starting up the shell, --norc to disable it
    • useful also to customize prompt
$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>


Editing Complex Variables

  • limited multiline support in the shell
  • external editors are allowed
  • EDITOR="/usr/bin/gedit"
  • EDITOR="c:\\windows\\notepad.exe"
$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)


Querying ⌘


Select ⌘

  • find, findOne
  • parameters: query, fields, limit, skip, sort, batchSize, options
> db.people.find()
> db.people.find({"Nationality" : "Great Britain"})
> db.people.find({"Nationality" : "Great Britain", "occupation" : "actor"})
> db.people.find({"_id" : ObjectId("545a36577907b2a255b156c4")})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1, "_id" : 0})
> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})


Query Criteria ⌘

  • $lt, $lte, $gt, $gte, $eq, $ne
  • $in, $nin, $or (whenever possible use $in instead of $or)
  • $not, $mod
  • handling null value, $exists
  • Perl Compatible Regular Expressions are allowed
> db.people.find({Nationality : "USA", BirthYear : {$lte : 1950}})
> db.people.find({Nationality : "USA", BirthYear : {$gte : 1940, $lte : 1950}})
> db.people.find({Nationality : {$in : ["Belgium", "Israel"]}})
> db.people.find({$or : [{Nationality : "Belgium"}, {Nationality : "Israel"}]})
> db.people.find({BirthYear : {$mod : [10, 0]}})
> db.people.find({BirthYear : {$not : {$mod : [10, 0]}}})
> db.people.find({Nationality : null})
> db.people.find({Nationality : {$exists : 1, $eq : null}})
> db.people.find({Name : /east/i})
> db.people.find({Name : /^A/})
> db.people.find({Occupation : /^actor$/i}, {Occupation:1, Name:1, _id:0})
> db.people.find({Name : /^cli.*.wood$/i}, {Name:1, _id:0})
> db.people.find({Name : /^clint eastw..d$/i}, {Name:1, _id:0})


Query Criteria for Arrays ⌘

  • behave in the same way as scalars
  • $all, $size (can not be combined with other criterias)
  • $slice, $-operator, * $elemMatch
> db.people.find({Occupation : "Actor"}, {Name : 1, Occupation : 1})
> db.people.find({Occupation : ["Actress"]})
> db.people.find({Occupation : {$all : ["Actress", "Producer"]}})
> db.people.find({"Occupation.1" : "Producer"})
> db.people.find({Occupation : {$size : 3}})
> db.people.find({Name: "Sean Connery"}, {Movie : {$slice : -1}})
> db.people.find({"Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie.Title" : "Never Say Never Again"}, {"Movie.$" : 1})


Query on Embedded Document ⌘

  • query for entire document
  • query for selected key/value pairs
  • $elemMatch
  • watch out for the dots (.) in keys names
> db.people.findOne({"BirthPlace" : {"Country" : "United Kingdom, Scotland", "City" : "Edinburgh"}})
> db.people.findOne({"BirthPlace" : {"City" : "Edinburgh", "Country" : "United Kingdom, Scotland"}})
> db.people.findOne({"BirthPlace.City" : "Edinburgh", "BirthPlace.Country" : "United Kingdom, Scotland"})
>
> db.people.find({"Movie._id" : ObjectId("545a5f167907b2a255b156c7"), "Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie" : {"$elemMatch" : {"_id" : ObjectId("545a5f167907b2a255b156c7"), "Title" : "Never Say Never Again"}}})


Using $where ⌘

  • $where allows to run arbitrary JavaScript as part of a query
  • powerful, slow, only for trusted users, can not use indexes
> db.test_where.insert([{"ordered" : 10, "sent" : 0}, {"ordered" : 10, "sent" : 5}, {"ordered" : 10, "sent" : 10}])
> db.test_where.find({"$where" : function() {
... for (var current in this) {
...   for (var other in this) {
...     if (current != other && this[current] == this[other]) {
...       return true;
...     }
...   }
... }
... return false;
... }});


Cursors ⌘

  • MongoDB returns results from find using a cursor
  • the lifetime of the cursor, immortal cursor
  • creating cursors in shell: var res = db.test.find()
  • use next to iterate through results
  • use hasNext to check if there is another document
> var cursor = db.people.find()
> while (cursor.hasNext()) { 
...   doc = cursor.next();
...   print(doc.Name+" was born in "+doc.BirthYear);
... }
Johnny Depp was born in 1963
Sean Connery was born in 1930


  • there is forEach as an alternative for while
> var cursor = db.people.find()
> cursor.forEach(function(doc) {
...   print(doc.Name+" was born in "+doc.BirthYear);
... })
Johnny Depp was born in 1963
Sean Connery was born in 1930


Query Options ⌘

  • .limit(), .skip(), .sort()
  • pagination and avoiding large skips
  • $maxScan, $showDiskLoc, $min, $max
  • .snapshot() for consistent results in large collections
> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})
> db.people.find({}, {"Nationality" : 1, "Name" : 1}).sort({"Nationality" : -1, "Name" : 1})
> db.people.find()._addSpecial("$maxScan", 5)
> db.people.find()._addSpecial("$showDiskLoc", true)
> db.people.find().min({"BirthYear" : 1975}).max({"BirthYear" : 1990})


Data Manipulation ⌘


Insert ⌘

  • insert is the basic method to add documents
  • _id will be created automatically if not present in document
  • batch insert (pass an array of documents, max 48MB at once, faster)
  • insert validation
    • only basic structure, size (16MB), adds _id
    • most drivers do additional tests (size, non-UTF-8 strings, unrecognised data types) before sending data to MongoDB
    • use Object.bsonsize(doc) to see the BSON size of doc in bytes
> db.people.insert({"_id" : 123, "Name" : "Al Pacino"})
> db.people.insert({"Name" : "Sean Connery"})
> db.people.insert([{"Name" : "Meryl Streep"}, {"Name" : "Clint Eastwood"}, {"Name" : "Penélope Cruz"}])


Remove ⌘

  • remove expects a query document as a parameter
  • if query document is empty it will delete all documents from the collection but not entire collection
  • there is no way to recover removed documents
  • drop is a very fast alternative when removing all documents (require to recreate all indexes)
> db.people.stats()
> db.people.remove({"Nationality" : "Great Britain"})
> db.people.remove({})
> db.people.drop()


Update ⌘

  • update expects two parameters:
    • a query document to locate documents to be updated
    • a new document or document that describes how found documents will be changed
  • conflicting updates - the last update will "win"
  • using _id for update criteria will be faster and safer than other fields
> person = db.people.findOne({"Name" : "Sean Connery"})
> person.Occupation = "Actor"
> db.people.update({"Name" : "Sean Connery"}, person)
> db.people.findOne({"Name" : "Sean Connery"})


Update Modifiers ⌘

  • useful when changing only specified fields of a document
  • used for complex update operations (altering, adding, removing keys, manipulating arrays or embedded documents)
  • if key does not exist it will be created
  • _id field cannot be changed
  • $inc, $set, $unset, $push, $pull, $each, $slice, $sort, $pop, $ne, $addToSet, $ (positional array modificator)
> db.people.findOne({"Name" : "Sean Connery"})
> db.people.update({"Name" : "Sean Connery"}, {"$inc" : {"Age" : 1}})
> db.people.findOne({"Name" : "Sean Connery"})


Upsert ⌘

  • upsert is a special type of update
  • advantage: the same code for create and update
  • it will create a document if there is no document that matches the update criteria
  • new document will combine update data and criteria
  • $setOnInsert modifier
  • .save() shell helper
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, true)
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, {upsert : true})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}, "$setOnInsert" : {"instertDate" : new Date()}}, {upsert : true})
> db.websites.find().pretty()


Updating Multiple Documents ⌘

  • by default only first document will be changed
  • default behaviour may change in the future
  • db.runCommand({getLastError : 1})
> db.people.insert([{"Name" : "Jack Nicholson", "Nationality" : "USA"}, {"Name" : "Bruce Willis", "Nationality" : "RFN"}, {"Name" : "Morgan Freeman", "Nationality" : "USA"}])
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}})
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, false, true)
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, {multi : true})


Two Phase Commits ⌘


Indexes and Query Optimization

  • database index works like index in a book
  • indexes in MongoDB works almost identical to relational databases
  • because of indexes reads are faster but writes slower
  • 64 indexes per collection


Collection Without Indexes

for (var i=1; i<=1000000; i++) {
  db.visitors.insert({
    "i" : i,
    "visitor" : "visitor_"+i,
    "score" : Math.floor(Math.random()*10+1),
    "date" : new Date()
  })
}


One Field Indexes

db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})


Compound Indexes

db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")


Compound Indexes (2)

  • pattern {SortKey : 1, CriteriaKey : 1}
  • pattern {ExactMatch : 1, RangeCriteria : 1}
  • sorting directions {"score" : -1, "visitor" : 1}
  • {"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
  • covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
  • {key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
  • $-operators and indexes ($where, $exists, $nin, $ne, $not)
  • queries with $or can use more than one index


Indexing Arrays and SubDocuments

  • usually it behaves like normal index
  • difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
  • only one array field per index
  • multikey indexes
db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})


Indexes in Details

  • low and high cardinality fields
  • understanding .explain()
  • using .hint()
  • query optimizer (100 results, 1000 queries, index creation)
  • fields in index must be smaller than 1kB
  • options: unique, dropDups, sparse
db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})


Sparse Index

  • only contain entries for documents that have the indexed field (even if the index field contains a null value)
  • useful with unique constraint
db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})

db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})

db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})


Index Administration

  • system.indexes - read only collection that stores info about indexes
  • .createIndex()
  • .dropIndex(), .dropIndexes()
  • .getIndexes()
  • options: background, name


Capped Collections

  • it has to be created before first insert occurs
  • fixed size or size and number of documents (circular queue)
  • forbidden operations on documents: removing, updating (if it will increase the size)
  • can not be sharded or changed
  • sorting: $natural : 1 (or -1)
db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})

db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})


Tailable Cursors

  • inspired by the tail -f command
  • not closed when their results are exhausted
  • can be used only on capped collection
  • will die after 10 minutes


TTL Indexes (time-to-live)

  • TTL index allows you to set a timeout for each document
  • removing is performed every 60 secons
  • can be created only on single field (date field)
db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()


Full-Text Indexes

  • quick text search with built-in multi-language support
  • very expensive, especially on busy collections
db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})

db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })


Geospatial Indexes

2d index

  • for data stored as points on a two-dimensional plane
db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})


2dsphere index

db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})

db.cities.find({location:{$near: 
    {$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})

db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers

db.cities.find({location: {$geoWithin: 
    {$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})

Aggregation ⌘


Single Purpose Aggregation ⌘

  • count, distinct, group
> db.people.count()
> db.people.count({"Occupation" : "Producer"})
>
> db.people.distinct("Movie.Title")
> db.runCommand({"distinct" : "people", "key" : "Movie.Title"})
>
> db.people.group({
...   "key" : {"Nationality" : 1},
...   //"$keyf" : function(curr) { return {"Nationality" : curr.Nationality.toLowerCase()}; },
...   "initial" : {"BirthYear" : 0},
...   "reduce" : function(curr, result) {
...     if (curr.BirthYear > result.BirthYear) {
...       result.BirthYear = curr.BirthYear;
...       result.Nationality = curr.Nationality;
...     }
...   },
...   "condition" : {"Nationality" : {"$ne" : null}},
...   "finalize" : function(result) {
...     result[result.Nationality] = result.BirthYear;
...     delete result.Nationality;
...     delete result.BirthYear;
...   }
... })


Aggregation Pipelines ⌘

  • aggregation is a pipeline
  • some pipeline operators: $match, $project, $group, $sort, $limit, $skip, $unwind, $lookup, $out
  • $group and $sort needs to collect all documents
  • results were limited to 16MB, now unlimited
  • memory usage limited to 100MB, allowDiskUse
> db.people.aggregate([{"$project" : {"Nationality" : 1}},
...   {"$group" : {"_id" : "$Nationality", "count" : {"$sum" : 1}}},
...   {"$sort" : {"count" : -1}},
...   {"$limit" : 3}
... ])
>
> db.people.aggregate([{"$project" : {"Country" : "$Nationality"}}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" :  0}}, {"$unwind" : "$Movie"}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" :  0}}, {"$unwind" : "$Movie"}, {$out:"aggregated"}])
> db.people.aggregate([...], {{allowDiskUse: true, explain: true}})


$project Expressions ⌘

  • mathematical: $add, $subtract, $multiply, $divide, $mod
  • date: $year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second
  • string: $substr, $concat, $toLower, $toUpper
  • comparison: $cmp, $strcasecmd, $eq, $ne, $gt, $gte, $lt, $lte
  • logical: $and, $or, $not, $cond, $ifNull
> db.people.aggregate([{"$match" : {"BirthDate" : {"$gte" : new Date("1970-01-01")}}}, {"$project" : {"BirthDate" : {"$year" : "$BirthDate"}}}])
> db.people.aggregate([{"$project" : {"Nationality" : {"$ifNull" : ["$Nationality", "unknown"]}}}])


$group Expressions ⌘

  • arithmetical: $sum, $avg,
  • extreme: $min, $max, $first, $last
  • array: $addToSet, $push
> db.people.aggregate([{$group:{_id:"$Nationality", peopleCount:{$sum:1}}}])
> db.people.aggregate([{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])


Views

  • new in version 3.4
  • views are always read-only
  • views are based on existing collections or other views
> db.createView(<view>, <source>, <pipeline>, <collation>)
> db.createView("peopleByBirthYear", "people", [{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])


Map-Reduce ⌘

  • powerful and flexible tool for aggregation
  • MR uses JavaScript as a "query language"
  • not very fast
  • arguments: mapreduce, map, reduce, finalize, keeptemp, out, query, sort, limit, scope, verbose
> map = function () {
... for (var key in this) {
...   emit(key, {count : 1});
... }}
>
> reduce = function (key, emits) {
...   total = 0;
...   for (var i in emits) {
...     total += emits[i].count;
...   }
...   return {"count" : total};
... }
>
> db.runCommand({"mapreduce" : "people", "map" : map, "reduce" : reduce, "out" : "mapreduce_output_collection"})
> db.mapreduce_output_collection.find()


Data models ⌘

  • normalization vs. denormalization (referencing vs. embedding)
  • few things to consider:
    • writes vs. reads
    • immediate consistency vs. eventual consistency
    • one-to-many vs. one-to-few
    • document growth
Embedding is better for... References are better for
small subdocuments large subdocuments
data that does not change regularly volatile data
when eventual consistency is acceptable when immediate consistency is necessary
documents that grow by a small amount documents that grow by a large amount
data that you'll often need to perform a second query to fetch data that you'll often exclude from results
fast reads fast writes
  • more resources:
    • docs.mongodb.com/manual/core/data-modeling-introduction/
    • developer.mongodb.com/article/mongodb-schema-design-best-practices/