MongoDB for Developers
Jump to navigation
Jump to search
Copyright Notice
Copyright © 2004-2023 by NobleProg Limited All rights reserved.
This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
Introduction
- MongoDB is a fast document-oriented database
- replaces the concept of a "row" with a more flexible model - the "document"
- convenient data storage for modern object-oriented languages
- no predefined schemas
- no transactions
- no SQL
- supports indexes
- easy to scale out horizontally
Getting Started
- document is a basic unit of data (like row in RDBMS)
- collection = table
- multiple databases in single instance
- _id in every document, unique within collection
- JavaScript shell for administration and data manipulation
Documents
- document is an ordered set of keys with associated values
- representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
- objects in JavaScript {key:value}
- example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
- key is a string, any UTF-8 is allowed (except \0 . $)
- type-sensitive {"age" : 3}, {"age" : "3"}
- case-sensitive {"age" : 3}, {"Age" : 3}
- documents cannot contain duplicated keys
- key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
- order does not usually matter, MongoDB can reorder keys
{
"_id" : ObjectId("545a414c7907b2a255b156c5"),
"Name" : "Sean Connery",
"Nationality" : "Great Britain",
"BirthDate" : ISODate("1930-08-25T00:00:00Z"),
"BirthYear" : 1930,
"Occupation" : [
"Actor",
"Director",
"Producer"
],
"Movie" : [
{
"_id" : ObjectId("545a5f167907b2a255b156c7"),
"Title" : "Dr. No"
},
{
"_id" : ObjectId("545a5f317907b2a255b156c8"),
"Title" : "From Russia with Love"
},
{
"_id" : ObjectId("545a5ed67907b2a255b156c6"),
"Title" : "Never Say Never Again"
}
],
"BirthPlace" : {
"Country" : "United Kingdom, Scotland",
"City" : "Edinburgh"
}
}
Collections
- a group of documents
- dynamic schemas
- {"company" : "NobleProg"}
- {"age" : 5}
- why should we use more than one collection?
- nightmare for developers
- much faster to get a list of collections than extracting document types from collections
- grouping documents of the same kind
- indexes
- collection name is a string, any UTF-8 is allowed except:
- empty string, start with "system." prefix, contain \0 character, $ character
- subcollections separated by the . character
- example GridFS (fs.files, fs.chunks)
Databases
- database is a group of collections
- one database = one application
- separated databases for different applications, users
- database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
- database name will end up as file on filesystem (this explains restrictions)
- special databases: admin (root database), local (never replicated), config (when sharding)
- namespace is a concatenation of database and collection name (fully qualified collection name)
- max 121 bytes, shold be less than 100
Getting and Starting MongoDB
- Installation on Windows
- Installation on Ubuntu
CRUD
Create
use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person) // .insert() is depricated now, use .insertOne() instead
Read
db.people.find()
db.people.findOne()
db.people.find().pretty() // .pretty() is the default now, so can be omitted
Update
personUp = {"Occupation" : "Actor"}
db.people.update({"Name" : "Sean Connery"}, {$set: personUp}) // .update() is depricated now, use .updateOne() instead
db.people.findOne()
Delete
db.people.remove({"Name" : "Sean Connery"}) // .remove() is depricated now, use .deleteOne() instead
db.people.remove({})
db.people.findOne()
Data Types
- JSON-like documents
- 6 data types: null, boolean, numeric, string, array, object
- MongoDB adds support for other datatypes
- null {"x" : null}
- boolean {"x" : true}
- number (by default 64-bit floating point numbers) {"x" : 1.4142}
- 4-byte integers {"x" : NumberInt(141)}
- 8-byte integers {"x" : LongInt(141)}
- string (any UTF-8 character) {"x" : "NobleProg"}
- date (stored as milliseconds from Linux epoch) {"x" : new Date()}
- regular expressions (in queries) {"x" : /bob/i}
- array {"x" : [1.4142, true, "training"]}
- embedded documents {"x" : {"y" : 100}}
- ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
- binary data (for non-UTF-8 strings)
- code {"x" : function() {/*...*/}}
_id and ObjectId
- every document in MongoDB must have _id
- can be any type but it defaults to ObjectId
- unique values in a single collection
- ObjectId is designed to lightweight and easy to generate
- 12-bytes of storage (24 hexidecimal digits)
- timestamp - byte 0-3
- machine - byte 4-6
- PID - byte 7-8
- increment - byte 9-11
- _id is generated automatically if not present in document
- 12-bytes of storage (24 hexidecimal digits)
MongoDB Shell
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>
$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test
$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg
Using help
- mongo is a JavaScript shell, help is available in JavaScript on-line documentation
- use built-in help for MongoDB-specific functionality
- type function name without parentheses to see what the function is doing
> help
db.help() help on db methods
db.mycoll.help() help on collections methods
...
exit quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>
Running Scripts
- mongo can execute JavaScript files
- scripts have access to all global variables (e.g. "db")
- shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
- use --quiet to hide "MongoDB shell version..." when executing script
- use load() to run script directly from Mongo Shell
- use .mongorc.js for frequently-loaded scripts
- located in user home directory, run when starting up the shell, --norc to disable it
- useful also to customize prompt
$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>
Editing Complex Variables
- limited multiline support in the shell
- external editors are allowed
- EDITOR="/usr/bin/gedit"
- EDITOR="c:\\windows\\notepad.exe"
$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)
Querying ⌘
Select ⌘
- find, findOne
- parameters: query, fields, limit, skip, sort, batchSize, options
> db.people.find()
> db.people.find({"Nationality" : "Great Britain"})
> db.people.find({"Nationality" : "Great Britain", "occupation" : "actor"})
> db.people.find({"_id" : ObjectId("545a36577907b2a255b156c4")})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1})
> db.people.findOne({}, {"Name" : 1, "Nationality" : 1, "_id" : 0})
> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})
Query Criteria ⌘
- $lt, $lte, $gt, $gte, $eq, $ne
- $in, $nin, $or (whenever possible use $in instead of $or)
- $not, $mod
- handling null value, $exists
- Perl Compatible Regular Expressions are allowed
> db.people.find({Nationality : "USA", BirthYear : {$lte : 1950}})
> db.people.find({Nationality : "USA", BirthYear : {$gte : 1940, $lte : 1950}})
> db.people.find({Nationality : {$in : ["Belgium", "Israel"]}})
> db.people.find({$or : [{Nationality : "Belgium"}, {Nationality : "Israel"}]})
> db.people.find({BirthYear : {$mod : [10, 0]}})
> db.people.find({BirthYear : {$not : {$mod : [10, 0]}}})
> db.people.find({Nationality : null})
> db.people.find({Nationality : {$exists : 1, $eq : null}})
> db.people.find({Name : /east/i})
> db.people.find({Name : /^A/})
> db.people.find({Occupation : /^actor$/i}, {Occupation:1, Name:1, _id:0})
> db.people.find({Name : /^cli.*.wood$/i}, {Name:1, _id:0})
> db.people.find({Name : /^clint eastw..d$/i}, {Name:1, _id:0})
Query Criteria for Arrays ⌘
- behave in the same way as scalars
- $all, $size (can not be combined with other criterias)
- $slice, $-operator, * $elemMatch
> db.people.find({Occupation : "Actor"}, {Name : 1, Occupation : 1})
> db.people.find({Occupation : ["Actress"]})
> db.people.find({Occupation : {$all : ["Actress", "Producer"]}})
> db.people.find({"Occupation.1" : "Producer"})
> db.people.find({Occupation : {$size : 3}})
> db.people.find({Name: "Sean Connery"}, {Movie : {$slice : -1}})
> db.people.find({"Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie.Title" : "Never Say Never Again"}, {"Movie.$" : 1})
Query on Embedded Document ⌘
- query for entire document
- query for selected key/value pairs
- $elemMatch
- watch out for the dots (.) in keys names
> db.people.findOne({"BirthPlace" : {"Country" : "United Kingdom, Scotland", "City" : "Edinburgh"}})
> db.people.findOne({"BirthPlace" : {"City" : "Edinburgh", "Country" : "United Kingdom, Scotland"}})
> db.people.findOne({"BirthPlace.City" : "Edinburgh", "BirthPlace.Country" : "United Kingdom, Scotland"})
>
> db.people.find({"Movie._id" : ObjectId("545a5f167907b2a255b156c7"), "Movie.Title" : "Never Say Never Again"})
> db.people.find({"Movie" : {"$elemMatch" : {"_id" : ObjectId("545a5f167907b2a255b156c7"), "Title" : "Never Say Never Again"}}})
Using $where ⌘
- $where allows to run arbitrary JavaScript as part of a query
- powerful, slow, only for trusted users, can not use indexes
> db.test_where.insert([{"ordered" : 10, "sent" : 0}, {"ordered" : 10, "sent" : 5}, {"ordered" : 10, "sent" : 10}])
> db.test_where.find({"$where" : function() {
... for (var current in this) {
... for (var other in this) {
... if (current != other && this[current] == this[other]) {
... return true;
... }
... }
... }
... return false;
... }});
Cursors ⌘
- MongoDB returns results from find using a cursor
- the lifetime of the cursor, immortal cursor
- creating cursors in shell: var res = db.test.find()
- use next to iterate through results
- use hasNext to check if there is another document
> var cursor = db.people.find()
> while (cursor.hasNext()) {
... doc = cursor.next();
... print(doc.Name+" was born in "+doc.BirthYear);
... }
Johnny Depp was born in 1963
Sean Connery was born in 1930
- there is forEach as an alternative for while
> var cursor = db.people.find()
> cursor.forEach(function(doc) {
... print(doc.Name+" was born in "+doc.BirthYear);
... })
Johnny Depp was born in 1963
Sean Connery was born in 1930
Query Options ⌘
- .limit(), .skip(), .sort()
- pagination and avoiding large skips
- $maxScan, $showDiskLoc, $min, $max
- .snapshot() for consistent results in large collections
> db.people.find().limit(3).skip(5).sort({"BirthYear" : -1})
> db.people.find({}, {"Nationality" : 1, "Name" : 1}).sort({"Nationality" : -1, "Name" : 1})
> db.people.find()._addSpecial("$maxScan", 5)
> db.people.find()._addSpecial("$showDiskLoc", true)
> db.people.find().min({"BirthYear" : 1975}).max({"BirthYear" : 1990})
Data Manipulation ⌘
Insert ⌘
- insert is the basic method to add documents
- _id will be created automatically if not present in document
- batch insert (pass an array of documents, max 48MB at once, faster)
- insert validation
- only basic structure, size (16MB), adds _id
- most drivers do additional tests (size, non-UTF-8 strings, unrecognised data types) before sending data to MongoDB
- use Object.bsonsize(doc) to see the BSON size of doc in bytes
> db.people.insert({"_id" : 123, "Name" : "Al Pacino"})
> db.people.insert({"Name" : "Sean Connery"})
> db.people.insert([{"Name" : "Meryl Streep"}, {"Name" : "Clint Eastwood"}, {"Name" : "Penélope Cruz"}])
Remove ⌘
- remove expects a query document as a parameter
- if query document is empty it will delete all documents from the collection but not entire collection
- there is no way to recover removed documents
- drop is a very fast alternative when removing all documents (require to recreate all indexes)
> db.people.stats()
> db.people.remove({"Nationality" : "Great Britain"})
> db.people.remove({})
> db.people.drop()
Update ⌘
- update expects two parameters:
- a query document to locate documents to be updated
- a new document or document that describes how found documents will be changed
- conflicting updates - the last update will "win"
- using _id for update criteria will be faster and safer than other fields
> person = db.people.findOne({"Name" : "Sean Connery"})
> person.Occupation = "Actor"
> db.people.update({"Name" : "Sean Connery"}, person)
> db.people.findOne({"Name" : "Sean Connery"})
Update Modifiers ⌘
- useful when changing only specified fields of a document
- used for complex update operations (altering, adding, removing keys, manipulating arrays or embedded documents)
- if key does not exist it will be created
- _id field cannot be changed
- $inc, $set, $unset, $push, $pull, $each, $slice, $sort, $pop, $ne, $addToSet, $ (positional array modificator)
> db.people.findOne({"Name" : "Sean Connery"})
> db.people.update({"Name" : "Sean Connery"}, {"$inc" : {"Age" : 1}})
> db.people.findOne({"Name" : "Sean Connery"})
Upsert ⌘
- upsert is a special type of update
- advantage: the same code for create and update
- it will create a document if there is no document that matches the update criteria
- new document will combine update data and criteria
- $setOnInsert modifier
- .save() shell helper
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, true)
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}}, {upsert : true})
> db.websites.update({"url" : "nobleprog.pl"}, {"$inc" : {"pageviews" : 1}, "$setOnInsert" : {"instertDate" : new Date()}}, {upsert : true})
> db.websites.find().pretty()
Updating Multiple Documents ⌘
- by default only first document will be changed
- default behaviour may change in the future
- db.runCommand({getLastError : 1})
> db.people.insert([{"Name" : "Jack Nicholson", "Nationality" : "USA"}, {"Name" : "Bruce Willis", "Nationality" : "RFN"}, {"Name" : "Morgan Freeman", "Nationality" : "USA"}])
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}})
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, false, true)
> db.people.update({"Nationality" : "USA"}, {"$set" : {"Nationality" : "United States of America"}}, {multi : true})
Two Phase Commits ⌘
Indexes and Query Optimization
- database index works like index in a book
- indexes in MongoDB works almost identical to relational databases
- because of indexes reads are faster but writes slower
- 64 indexes per collection
Collection Without Indexes
for (var i=1; i<=1000000; i++) {
db.visitors.insert({
"i" : i,
"visitor" : "visitor_"+i,
"score" : Math.floor(Math.random()*10+1),
"date" : new Date()
})
}
One Field Indexes
db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})
Compound Indexes
db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")
Compound Indexes (2)
- pattern {SortKey : 1, CriteriaKey : 1}
- pattern {ExactMatch : 1, RangeCriteria : 1}
- sorting directions {"score" : -1, "visitor" : 1}
- {"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
- covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
- {key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
- $-operators and indexes ($where, $exists, $nin, $ne, $not)
- queries with $or can use more than one index
Indexing Arrays and SubDocuments
- usually it behaves like normal index
- difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
- only one array field per index
- multikey indexes
db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})
Indexes in Details
- low and high cardinality fields
- understanding .explain()
- using .hint()
- query optimizer (100 results, 1000 queries, index creation)
- fields in index must be smaller than 1kB
- options: unique, dropDups, sparse
db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})
Sparse Index
- only contain entries for documents that have the indexed field (even if the index field contains a null value)
- useful with unique constraint
db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})
db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})
Index Administration
- system.indexes - read only collection that stores info about indexes
- .createIndex()
- .dropIndex(), .dropIndexes()
- .getIndexes()
- options: background, name
Capped Collections
- it has to be created before first insert occurs
- fixed size or size and number of documents (circular queue)
- forbidden operations on documents: removing, updating (if it will increase the size)
- can not be sharded or changed
- sorting: $natural : 1 (or -1)
db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})
db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})
Tailable Cursors
- inspired by the tail -f command
- not closed when their results are exhausted
- can be used only on capped collection
- will die after 10 minutes
TTL Indexes (time-to-live)
- TTL index allows you to set a timeout for each document
- removing is performed every 60 secons
- can be created only on single field (date field)
db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()
Full-Text Indexes
- quick text search with built-in multi-language support
- very expensive, especially on busy collections
db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})
db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })
Geospatial Indexes
2d index
- for data stored as points on a two-dimensional plane
db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})
2dsphere index
- supports queries that calculate geometries on an earth-like sphere
- data stored as GeoJSON objects
- operators: http://docs.mongodb.org/master/reference/operator/query-geospatial/
- calculating distance: http://docs.mongodb.org/master/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/
db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})
db.cities.find({location:{$near:
{$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})
db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers
db.cities.find({location: {$geoWithin:
{$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})
Aggregation ⌘
Single Purpose Aggregation ⌘
- count, distinct, group
> db.people.count()
> db.people.count({"Occupation" : "Producer"})
>
> db.people.distinct("Movie.Title")
> db.runCommand({"distinct" : "people", "key" : "Movie.Title"})
>
> db.people.group({
... "key" : {"Nationality" : 1},
... //"$keyf" : function(curr) { return {"Nationality" : curr.Nationality.toLowerCase()}; },
... "initial" : {"BirthYear" : 0},
... "reduce" : function(curr, result) {
... if (curr.BirthYear > result.BirthYear) {
... result.BirthYear = curr.BirthYear;
... result.Nationality = curr.Nationality;
... }
... },
... "condition" : {"Nationality" : {"$ne" : null}},
... "finalize" : function(result) {
... result[result.Nationality] = result.BirthYear;
... delete result.Nationality;
... delete result.BirthYear;
... }
... })
Aggregation Pipelines ⌘
- aggregation is a pipeline
- some pipeline operators: $match, $project, $group, $sort, $limit, $skip, $unwind, $lookup, $out
- $group and $sort needs to collect all documents
- results were limited to 16MB, now unlimited
- memory usage limited to 100MB, allowDiskUse
> db.people.aggregate([{"$project" : {"Nationality" : 1}},
... {"$group" : {"_id" : "$Nationality", "count" : {"$sum" : 1}}},
... {"$sort" : {"count" : -1}},
... {"$limit" : 3}
... ])
>
> db.people.aggregate([{"$project" : {"Country" : "$Nationality"}}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" : 0}}, {"$unwind" : "$Movie"}])
> db.people.aggregate([{"$project" : {"Movie" : 1, "_id" : 0}}, {"$unwind" : "$Movie"}, {$out:"aggregated"}])
> db.people.aggregate([...], {{allowDiskUse: true, explain: true}})
$project Expressions ⌘
- mathematical: $add, $subtract, $multiply, $divide, $mod
- date: $year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second
- string: $substr, $concat, $toLower, $toUpper
- comparison: $cmp, $strcasecmd, $eq, $ne, $gt, $gte, $lt, $lte
- logical: $and, $or, $not, $cond, $ifNull
> db.people.aggregate([{"$match" : {"BirthDate" : {"$gte" : new Date("1970-01-01")}}}, {"$project" : {"BirthDate" : {"$year" : "$BirthDate"}}}])
> db.people.aggregate([{"$project" : {"Nationality" : {"$ifNull" : ["$Nationality", "unknown"]}}}])
$group Expressions ⌘
- arithmetical: $sum, $avg,
- extreme: $min, $max, $first, $last
- array: $addToSet, $push
> db.people.aggregate([{$group:{_id:"$Nationality", peopleCount:{$sum:1}}}])
> db.people.aggregate([{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])
Views
- new in version 3.4
- views are always read-only
- views are based on existing collections or other views
> db.createView(<view>, <source>, <pipeline>, <collation>)
> db.createView("peopleByBirthYear", "people", [{$group:{_id:"$BirthYear", people:{$addToSet:"$Name"}}}])
Map-Reduce ⌘
- powerful and flexible tool for aggregation
- MR uses JavaScript as a "query language"
- not very fast
- arguments: mapreduce, map, reduce, finalize, keeptemp, out, query, sort, limit, scope, verbose
> map = function () {
... for (var key in this) {
... emit(key, {count : 1});
... }}
>
> reduce = function (key, emits) {
... total = 0;
... for (var i in emits) {
... total += emits[i].count;
... }
... return {"count" : total};
... }
>
> db.runCommand({"mapreduce" : "people", "map" : map, "reduce" : reduce, "out" : "mapreduce_output_collection"})
> db.mapreduce_output_collection.find()
Data models ⌘
- normalization vs. denormalization (referencing vs. embedding)
- few things to consider:
- writes vs. reads
- immediate consistency vs. eventual consistency
- one-to-many vs. one-to-few
- document growth
Embedding is better for... | References are better for |
---|---|
small subdocuments | large subdocuments |
data that does not change regularly | volatile data |
when eventual consistency is acceptable | when immediate consistency is necessary |
documents that grow by a small amount | documents that grow by a large amount |
data that you'll often need to perform a second query to fetch | data that you'll often exclude from results |
fast reads | fast writes |
- more resources:
- docs.mongodb.com/manual/core/data-modeling-introduction/
- developer.mongodb.com/article/mongodb-schema-design-best-practices/