MongoDB for Administrators

Copyright Notice

This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.

Introduction

MongoDB is a fast document-oriented database
replaces the concept of a "row" with a more flexible model - the "document"
convenient data storage for modern object-oriented languages
no predefined schemas
no transactions
no SQL
supports indexes
easy to scale out horizontally

Getting Started

document is a basic unit of data (like row in RDBMS)
collection = table
multiple databases in single instance
_id in every document, unique within collection
JavaScript shell for administration and data manipulation

Documents

document is an ordered set of keys with associated values
representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
objects in JavaScript {key:value}
example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
key is a string, any UTF-8 is allowed (except \0 . $)
type-sensitive {"age" : 3}, {"age" : "3"}
case-sensitive {"age" : 3}, {"Age" : 3}
documents cannot contain duplicated keys
key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
- order does not usually matter, MongoDB can reorder keys

{
  "_id" : ObjectId("545a414c7907b2a255b156c5"),
  "Name" : "Sean Connery",
  "Nationality" : "Great Britain",
  "BirthDate" : ISODate("1930-08-25T00:00:00Z"),
  "BirthYear" : 1930,
  "Occupation" : [
    "Actor",
    "Director",
    "Producer"
  ],
  "Movie" : [
    {
      "_id" : ObjectId("545a5f167907b2a255b156c7"),
      "Title" : "Dr. No"
    },
    {
      "_id" : ObjectId("545a5f317907b2a255b156c8"),
      "Title" : "From Russia with Love"
    },
    {
      "_id" : ObjectId("545a5ed67907b2a255b156c6"),
      "Title" : "Never Say Never Again"
    }
  ],
  "BirthPlace" : {
    "Country" : "United Kingdom, Scotland",
    "City" : "Edinburgh"
  }
}

Collections

a group of documents
dynamic schemas
- {"company" : "NobleProg"}
- {"age" : 5}
why should we use more than one collection?
- nightmare for developers
- much faster to get a list of collections than extracting document types from collections
- grouping documents of the same kind
- indexes
collection name is a string, any UTF-8 is allowed except:
- empty string, start with "system." prefix, contain \0 character, $ character
subcollections separated by the . character
- example GridFS (fs.files, fs.chunks)

Databases

database is a group of collections
one database = one application
separated databases for different applications, users
database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
database name will end up as file on filesystem (this explains restrictions)
special databases: admin (root database), local (never replicated), config (when sharding)
namespace is a concatenation of database and collection name (fully qualified collection name)
- max 121 bytes, shold be less than 100

Getting and Starting MongoDB

Installation on Windows
- http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
Installation on Ubuntu
- http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

CRUD

Create

use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person) // .insert() is depricated now, use .insertOne() instead

Read

db.people.find()
db.people.findOne()
db.people.find().pretty() // .pretty() is the default now, so can be omitted

Update

personUp = {"Occupation" : "Actor"}
db.people.update({"Name" : "Sean Connery"}, {$set: personUp}) // .update() is depricated now, use .updateOne() instead
db.people.findOne()

Delete

db.people.remove({"Name" : "Sean Connery"}) // .remove() is depricated now, use .deleteOne() instead
db.people.remove({})
db.people.findOne()

Data Types

JSON-like documents
- 6 data types: null, boolean, numeric, string, array, object
MongoDB adds support for other datatypes
- null {"x" : null}
- boolean {"x" : true}
- number (by default 64-bit floating point numbers) {"x" : 1.4142}
  - 4-byte integers {"x" : NumberInt(141)}
  - 8-byte integers {"x" : LongInt(141)}
- string (any UTF-8 character) {"x" : "NobleProg"}
- date (stored as milliseconds from Linux epoch) {"x" : new Date()}
- regular expressions (in queries) {"x" : /bob/i}
- array {"x" : [1.4142, true, "training"]}
- embedded documents {"x" : {"y" : 100}}
- ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
- binary data (for non-UTF-8 strings)
- code {"x" : function() {/*...*/}}

_id and ObjectId

every document in MongoDB must have _id
can be any type but it defaults to ObjectId
unique values in a single collection
ObjectId is designed to lightweight and easy to generate
- 12-bytes of storage (24 hexidecimal digits)
  - timestamp - byte 0-3
  - machine - byte 4-6
  - PID - byte 7-8
  - increment - byte 9-11
- _id is generated automatically if not present in document

MongoDB Shell

$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>

$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test

$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg

Using help

mongo is a JavaScript shell, help is available in JavaScript on-line documentation
use built-in help for MongoDB-specific functionality
type function name without parentheses to see what the function is doing

> help
    db.help()            help on db methods
    db.mycoll.help()     help on collections methods
    ...
    exit                 quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
    return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>

Running Scripts

mongo can execute JavaScript files
scripts have access to all global variables (e.g. "db")
shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
use --quiet to hide "MongoDB shell version..." when executing script
use load() to run script directly from Mongo Shell
use .mongorc.js for frequently-loaded scripts
- located in user home directory, run when starting up the shell, --norc to disable it
- useful also to customize prompt

$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>

Editing Complex Variables

limited multiline support in the shell
external editors are allowed
EDITOR="/usr/bin/gedit"
EDITOR="c:\\windows\\notepad.exe"

$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)

Single-server Configuration and Deployment

Configuration File Options

YAML-based configuration file format (since 2.6)
- http://docs.mongodb.org/manual/reference/configuration-options/

mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf

net:
  port: 27017
  bindIp: 127.0.0.1
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 10
storage:
  dbPath: c:\data\db
  wiredTiger:
    engineConfig:
      cacheSizeGB: 1
systemLog:
  destination: file
  path: c:\data\logs\mongodb.log
security:
  authorization: enabled
  keyFile: c:\data\config\keyfile.txt
replication:
  replSetName: training
  oplogSizeMB: 128

Storage engine

A storage engine is the part of a database that is responsible for managing how data is stored on disk.
- in other words, it's an interface between database and hardware
MMAPv1 Storage Engine
- historically first storage engine
- collection level locking
- in-place updates
- power of 2 sized document allocations
WiredTiger Storage Engine
- new in version 3.0
- default since 3.2
- improved performance in most use cases
- document level locking
- compression (for data and indexes)
In-Memory (experimental)
RocksDB, HDFS, FusionIO (under development)

MMAPv1

--storageEngine mmapv1
locks:
- multiple readers, single writer lock
- shared resources: data, metadata (indexes, journal)
lock levels:
- database level locking (2.2 - 2.6)
- collection level locking (3.0)
journal - write-ahead transaction log
- all operations are written to journal
- and after that are applied to data files
data on disk is raw BSON, directly mapped into virtual memory
document allocations
- no padding
- padding factor (automatic, manual)
- power of 2 sized allocations (from 32B to 2MB)

WiredTiger

--storageEngine wiredTiger
data is stored in B-Trees (similar to B-Trees MMAPv1 uses for indexes)
- initially documents get written into un-used regions
- and then merged with the rest of the data in background later
WT uses two caches:
- WT cache (half of RAM by default)
- operating system cache
it uses a write-ahead transaction log in combination with checkpoints to ensure data persistence
MongoDB will commit a checkpoint to disk
- 60 seconds after the end of previous checkpoint or
- when there is to much dirty data in WT cache (2 gigabytes of data)
document level locking
- WT has no locks but good concurrency protocols
- writes should scale with the number of threads
compression (can be set for each collection separately)
- snappy (fast, balance storage efficiency and processing requirements)
- zlib (higher compression rates at the cost of more CPU)
- off
WiredTiger options
how to test your own data?

db.createCollection( "email", { storageEngine: { wiredTiger: { configString: 'block_compressor=zlib' }}})

Authentication and Authorization

MongoDB employs Role-Based Access Control
MongoDB supports
- password-based authentication (SCRAM-SHA-1, MONGODB-CR)
- x.509 certificates
- LDAP proxy (enterprise edition)
- Kerberos (enterprise edition)
authentication is disabled by default
- to turn it on use --auth switch,
- create at least one superuser account before
user belongs to a database and must be authenticated in that database
users created in admin or local database can perform operations on all databases
commonly used roles: read, readWrite, dbAdmin, userAdmin, dbOwner
http://docs.mongodb.org/manual/core/authorization/

> use admin
> db.createUser({user : "ubuntu", pwd : "NobleProg", roles : ["root"]})

$ mongo -u ubuntu -p --authenticationDatabase admin
$ mongo localhost/admin -u ubuntu -p

> use admin
> db.auth("ubuntu", "NobleProg")
> use test
> db.createUser({user: "testuser", pwd: "NobleProg", roles: ["dbOwner"]})
> db.getSiblingDB("admin").system.users.find().pretty()
> db.getUser("testuser")
> db.getRole("dbOwner")
> db.getRole("dbOwner", {showPrivileges: true})

> db.grantRolesToUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "readWrite", db: "dept"}])
> db.revokeRolesFromUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "dbAdmin", db: "dept"}])

> db.dropUser("testuser")
> db.logout()

//how to create a custom role
> use admin
> db.auth("ubuntu", "NobleProg")
> db.createUser({user: "monitoringuser", pwd: "NobleProg", roles: []})
> db.createRole({role: "statsWatcher", privileges: [{resource: {"anyResource": true}, actions: ["serverStatus"]}], roles: []})
> db.getRole("statsWatcher", {showPrivileges: true})
> db.grantRolesToUser("monitoringuser", [{role: "statsWatcher", db: "admin"}])

Monitoring MongoDB

mongotop
mongostat
MongoDB's Web Console on port +1000
- enable rest and httpinterface
MongoDB Monitoring Service (MMS)
- plus Munin for monitoring hardware

> db.serverStatus()
> db.runCommand("serverStatus")

> rs.status()
> db.runCommand("replSetGetStatus")

> db.currentOp()
> db.stats()
> db.collection.stats()

Profiler

can be turned on database level or for entire server
profiling levels:
- 0 - no profiling
- 1 - only includes "slow" operations
- 2 - includes all operations

> db.getProfilingLevel()
> db.setProfilingLevel(1, 100)
> db.system.profile.find().sort({millis : -1}).pretty()

Indexes and Query Optimization

database index works like index in a book
indexes in MongoDB works almost identical to relational databases
because of indexes reads are faster but writes slower
64 indexes per collection

Collection Without Indexes

for (var i=1; i<=1000000; i++) {
  db.visitors.insert({
    "i" : i,
    "visitor" : "visitor_"+i,
    "score" : Math.floor(Math.random()*10+1),
    "date" : new Date()
  })
}

One Field Indexes

db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})

Compound Indexes

db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")

Compound Indexes (2)

pattern {SortKey : 1, CriteriaKey : 1}
pattern {ExactMatch : 1, RangeCriteria : 1}
sorting directions {"score" : -1, "visitor" : 1}
{"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
{key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
$-operators and indexes ($where, $exists, $nin, $ne, $not)
queries with $or can use more than one index

Indexing Arrays and SubDocuments

usually it behaves like normal index
difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
only one array field per index
multikey indexes

db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})

Indexes in Details

low and high cardinality fields
understanding .explain()
using .hint()
query optimizer (100 results, 1000 queries, index creation)
fields in index must be smaller than 1kB
options: unique, dropDups, sparse

db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})

Sparse Index

only contain entries for documents that have the indexed field (even if the index field contains a null value)
useful with unique constraint

db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})

db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})

db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})

Index Administration

system.indexes - read only collection that stores info about indexes
.createIndex()
.dropIndex(), .dropIndexes()
.getIndexes()
options: background, name

Capped Collections

it has to be created before first insert occurs
fixed size or size and number of documents (circular queue)
forbidden operations on documents: removing, updating (if it will increase the size)
can not be sharded or changed
sorting: $natural : 1 (or -1)

db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})

db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})

Tailable Cursors

inspired by the tail -f command
not closed when their results are exhausted
can be used only on capped collection
will die after 10 minutes

TTL Indexes (time-to-live)

TTL index allows you to set a timeout for each document
removing is performed every 60 secons
can be created only on single field (date field)

db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()

Full-Text Indexes

quick text search with built-in multi-language support
very expensive, especially on busy collections

db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})

db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })

Geospatial Indexes

2d index

for data stored as points on a two-dimensional plane

db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})

2dsphere index

supports queries that calculate geometries on an earth-like sphere
data stored as GeoJSON objects
operators: http://docs.mongodb.org/master/reference/operator/query-geospatial/
calculating distance: http://docs.mongodb.org/master/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/

db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})

db.cities.find({location:{$near: 
    {$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})

db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers

db.cities.find({location: {$geoWithin: 
    {$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})

Replication

what will happen when standalone server crashes?
replication is a way of keeping identical server copies
recommended for all production deployments
replication is a set of one primary server and many secondaries
replication in MongoDB is asynchronous
primary server handles client requests
when primary crashes secondaries will elect new primary
primary (master), secondary (slave)

Test Setup

this is the way to start 3 servers on ports 31000, 31001, 31002
- or on ports 20000, 20001, 20002 starting from MongoDB 3.2
databases are stored in default directory (/data/db)

$ mongo --nodb
> replicaSet = new ReplSetTest({"nodes" : 3})
> replicaSet.startSet()
> replicaSet.initiate()

$ mongo --nodb
> conn0 = new Mongo("localhost:31000")
> db0 = conn0.getDB("test")
> db0.isMaster()
> for (i=0; i<100; i++) { db0.repl.insert({"x" : i}) }
> db0.repl.count()
>
> conn1 = new Mongo("localhost:31001")
> db1 = conn1.getDB("test")
> db1.repl.count()
> db1.setSlaveOk()
> db1.repl.count()
>
> db0.adminCommand({"shutdown" : 1})
> db1.isMaster()

Production-Like Setup

this example show how to start replica with 3 members
it is on the same server but easily can be changed to multiple machines
only one server (localhost:27100) may contain data
replica set name can be any utf-8 string (np_rep in this example)
use --replSet np_rep when starting mongod
each member of the replica set must be able to connect with all members

$ mongod --port 27100 --replSet np_rep --dbpath /data/np_rep/rep0 --logpath /data/np_rep/rep0.log --fork --bind_ip 127.0.0.1 --rest --httpinterface --logappend
$ mongod --port 27101 --replSet np_rep --dbpath /data/np_rep/rep1 --logpath /data/np_rep/rep1.log --fork ...
$ mongod --port 27102 --replSet np_rep --dbpath /data/np_rep/rep2 --logpath /data/np_rep/rep2.log --fork ...
$
$ mongod --shutdown --dbpath /data/np_rep/rep0

Mongo Shell is the only way to configure replica set
prepare configuration document (np_rep_config)
- _id key is your replica set name
- members is a list of all servers in replica set
send configuration to server with data
this server will change configuration of other members
servers will elect primary and start handling clients requests
localhost as a server name can be used only for testing

> np_rep_config = { 
...   "_id" : "np_rep",
...   "members" : [
...     {"_id" : 0, "host" : "localhost:27100"},
...     {"_id" : 1, "host" : "localhost:27101"},
...     {"_id" : 2, "host" : "localhost:27102"}
...   ]
... }
>
> rs.initiate(np_rep_config)
> db = (new Mongo("localhost:27102")).getDB("test")

rs Helper

rs is a global variable containing replication helper functions
rs.help() - list of all functions
rs.initiate(np_rep_config) is a wrapper to adminCommand
- db.adminCommand({"replSetInitiate" : np_rep_config})

> rs.help()
> rs.add("localhost:27103")
> rs.remove("localhost:27103")
> rs.config()
> np_rep_config = rs.config()
> edit np_rep_config 
> rs.reconfig(np_rep_config)

All About Majorities

how to design replica set?
majority = more than half of all members in the set
majority is based on configuration not on the current status of replica
- i.e. 5 members replica, 3 members are down
- two remaining members can not reach majority
- if one of two was primary it would step down
- all two members will be secondaries
why two of five can not reach majority?
examples: 3+2 members, 2+2+1 members, only 2 members

How Election Works

election starts when secondary can not reach primary
it will contact all other members and request that it be primary
others do several checks:
- can they reach primary?
- is candidate up to date with replication?
- is there any member with higher priority?
election ends when candidate receives "yes" from a majority
members can send veto (10000 votes)
heartbeat is sent every 2 seconds with 10 seconds time-out to all members
- based on this, the primary knows if it can reach a majority
- when election results in a tie all members will wait 30 seconds

Member Configuration

arbiter
- normal mongod process started with --replSet option and empty data directory
- > rs.addArb("server:port")
- > rs.add({"_id" : 7, "host" : "server:port", "arbiterOnly" : true})
- use at most one arbiter, don't add arbiters "just in case"
- arbiter once added can't be changed to normal mongod, and vice versa
- use normal data member instead of an arbiter whenever possible
priority
- priority range is from 0 to 100 (default 1)
- it means how badly this member wants to become primary
- member with priority 0 can never become primary (passive members)
hidden
- hidden members don't handle client requests
- hidden members are not preferred as replication sources
- useful for backup or less powerful machines
- use hidden : true with priority : 0
- rs.isMaster(), rs.status(), rs.config()
slave delay
- delayed secondary will purposely lag by the specified number of second
- use slaveDelay : seconds with priority : 0
- ... and hidden : true if your clients reads from secondaries
building indexes
- useful for backup servers
- prevents from building any indexes
- use buildIndexes : false with priority : 0
- non-index-building member can't be changed to normal easily

Sync Process

oplog contains all operations that primary performs
capped collection in local database on each member
any member can be used as a source for replication
steps: read data from primary -> apply operation to data -> write to local oplog
re applying the same oplog operation is safe and handled properly (the same result)
oplog is fixed in size not in time
- usually one operation on data results in one operation in oplog
- exception: one operation that affects multiple documents is stored in oplog as many operations on single document
secondary may go stale when: secondary had downtime, has too many writes than it can handle, to busy handling reads

Initial Sync Process

initial checks (choose a source, drop existing databases)
cloning data from source (longest operation)
first oplog application
second oplog application
index building (long operation)
third oplog application
switching to normal syncing and normal operations

restoring from backup is often faster
cloning may ruin source's working set
initial sync may fail if oplog is too short

States of Replica Set Members

Primary
Secondary
Startup (when you start a member for the first time)
Startup2 (initial sync, short state on normal members: starts replication and election process)
Recovering (failure state, member is operating correctly but not available for reads, occurs in many situations)
Arbiter
Down (when member was up and then becomes unreachable)
Unknown (when member has never been able to reach another member)
Removed (after removing from replica set, when added back it will return into "normal" state)
Rollback (when rolling back data)
Fatal (find a reason in log, grep "replSet FATAL", restore from backup or resync)

Rollbacks

what is rollback and when it will be performed
synchronization must be done manually
- collectionName.bson file in a rollback directory in data directory
- mongorestore above file into temporary collection and perform manual merge
rollback will fail if there is more than 300MB of data or about 30 minutes of operations to rollback
how to prevent rollbacks
- do not change number of votes of replica set members
- keep secondaries up to date

Connection to Replica Set and Replication Guarantees

connection strings: mongodb://user_name:password@host1:27017,host2:27018,host3:27019/?replicaSet=replicaName&connectTimeoutMS=10000&authMechanism=SCRAM-SHA-1

> db.runCommand({getLastError: 1, w: "majority"})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority"}})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority", wtimeout: 5000}})
> db.testColl.insert({a: 1}, {writeConcern: {w: 3, wtimeout: 5000}})

> config = rs.conf()
> config.settings = {}
> config.settings.getLastErrorDefaults = {w: "majority", wtimeout: 5000}
> rs.reconfig(config)

> config = rs.config()
> config.members[0].tags = {"dc": "PL"}
> config.members[1].tags = {"dc": "UK"}
> config.settings = config.settings || {}
> config.settings.getLastErrorModes = [{"allDataCenters" : {"dc": 2}}]
> rs.reconfig(config)
> db.testColl.insert({a: 1}, {writeConcern: {w: "allDataCenters", wtimeout: 5000}})

Read Preference

read preference describes how MongoDB clients route read operations to the members of a replica set
read preference modes:
- primary - default mode
- primaryPreferred
- secondary
- secondaryPreferred
- nearest
tag sets

Administration of Replica Set

Run as Standalone

many operation can not be done on secondaries
1. stop the secondary
2. run mongod without --replSet, on different port, with the same --dbpath

$ mongod --port 27999 --dbpath /var/lib/mongodb

Large Replica Sets

replica sets are limited to 50 members (12 before version 3.0) and only 7 voting members
this is limited to reduce network traffic generated by heartbeat
use Master-Slave configuration when more than 50 secondaries is needed

> rs.add({"_id" : 51, "host" : "server-8:27017", "votes" : 0})

Forcing Reconfiguration

useful when you majority is lost permanantly
force allows you to send the configuration to secondary (not only to primary)
configuration needs to be prepared correctly
force will change version dramatically

> var config = rs.config()
> edit config
> rs.reconfig(config, {"force" : true})

Changing Member Status Manually

there is no way to force member to become primary
demote primary to a secondary (by default 60 seconds)

> rs.stepDown()
> rs.stepDown(600)

Preventing Elections

in can be also used on demoted primary to unfreeze it

> rs.freeze(600)
> rs.freeze(0)

Maintenance Mode

server will go into recovery state
useful if server is performing long operations or is far behind in replication

> db.adminCommand({"replSetMaintenanceMode" : true})
> db.adminCommand({"replSetMaintenanceMode" : false})

Monitoring Replication

status of the replica set (from the current server perspective)
db.adminCommand("replSetGetStatus") or rs.status()
important fields: self, stateStr, uptime [s], optimeDate (last oplog operation), pingMs, errmsg

> rs.status()

Replication Source

rs.status() can be used to create the replication graph
replication source is determined by ping (smallest)
it can cause chains creation in replication
replication loops: s1 from s2, s2 from s3, s3 from s1

> db.adminCommand({"replSetSyncFrom" : server:27017})
> var config = rs.config()
> config.settings = config.settings || {}
> config.settings.chainingAllowed = false
> rs.reconfig(config)

Resizing the Oplog

long oplog gives you time for maintenance
oplog on primary should be at least few days or even weeks long

> db.printReplicationInfo()
> db.printSlaveReplicationInfo()

procedure to resize the oplog:

demote primary into secondary
shut down and restart as standalone
copy the last insert from the oplog into temporary collection

> use local
> var cursor = db.oplog.rs.find({"op" : "i"})
> var lastInsert = cursor.sort({"$natural" : -1}).limit(1).next()
> db.tmpLastOp.save(lastInsert)
> db.tmpLastOp.findOne()

drop current oplog

> db.oplog.rs.drop()

create new oplog with different size

> db.createCollection("oplog.rs", {"capped" : true, "size" : 10000000})

move last operation from temporary collection into new oplog

> var lastInsert = db.tmpLastOp.findOne()
> db.oplog.rs.insert(lastInsert)
> db.oplog.rs.findOne()

shut down standalone server and restart as replica set member

Other operations

restoring data from delayed secondary
building indexes
replication on cheap machines: priority:0, hidden:true, buildIndexes:false, votes:0
master - slave configuration and mimicking this behaviour when using replica set
calculating lag: local.slaves, local.me, local.slaves.drop()

Backup and Restore

backups generally should be done on secondaries
or on standalone servers at an off time

Method #1: Filesystem Snapshot

the simplest way to make backup
filesystem must support snapshotting
mongod must run with journaling enabled
stop mongod before restoring

# Create the snapshot volume
lvcreate -L128M -s -n dbbackup /dev/ops/databases

# Mount the snapshot volume
mkdir /mnt/ops/dbbackup
mount /dev/ops/dbbackup /mnt/ops/dbbackup

# Do the backup
tar -cf /dev/rmt0 /mnt/ops/dbbackup

# Remove the snapshot
umount /mnt/ops/dbbackup
lvremove /dev/ops/dbbackup

Method #2: Copying Data Files

use fsyncLock() before copying files
- fsycnLock() prevents all databases against any further write
- saves all dirty data to disk
- queues all write operations
now files are consistent and copying of all files is possible
fsyncUnlock() releases the lock and brings back database to normal operations
if using authentication do not log out from the console between fsyncLock() and fsyncUnlock()
stop mongod before restoring
- do not restore single database if crash or hard shutdown occurs

> db.fsyncLock()
> db.fsyncUnlock()

Method #3: Using mongodump and mongorestore

mongorestore is slower than previous methods and has other downsides
good way to backup individual databases or collections
it will create a dump directory (with subdirectories for each database) in current directory
data is stored in .bson files
mongodump can be used when mongod is not running
- do not use --dbpath when mongod is running
when mongodump is running writes are allowed so already backuped data may change before mongodump will finish
- do not use fsyncLock() when using mongodump
- use --oplog if you are running mongod with --replSet
mongodump will choose secondary when connected to replica set
use mongodump and mongorestore in the same version
try to avoid this type of backup if you have unique indexes other than _id

mongodump --help
mongodump --port 27017
mongodump --dbpath /var/lib/mongodb
mongorestore --port 27017 --drop
mongodump --port 27017 --oplog
mongorestore --port 27017 --oplogReplay dump/
mongorestore -d dstDB -c dstCollection dump/srcDB/scrColl.bson

Backup of Replica Set

all previous methods are OK but 1st and 2nd are recommended (without any modifications)
when using mongodump use --oplog
when restoring with mongorestore:
1. start a server as a standalone and
2. restore the data with --oplogReplay and
3. restore the oplog collection
4. restart the server as a member of replica

> use local
> db.createCollection("oplog.rs", {"capped" : true, "size" : 1000000})
$ mongorestore -d local -c oplog.rs dump/oplog.bson

Backup of Sharded Cluster

usually it can not be done perfectly
instead of backing up everything at once, backup servers separately
turn off balancer before making backup
run mongodump through mongos to backup entire cluster
problems with restoring single shard

Other Administrative Tasks

Preheating data
- loading everything into memory
- loading selected collections only

> db.runCommand({ touch: "collectionName", data: true, index: true })

- loading specific indexes
- loading recently created documents
- replay application usage

Compacting

> db.runCommand({"compact" : "collectionName", "paddingFactor" : 1.5})

Repairing data
- use --repair and optionally --repairpath

> db.repairDatabase()