MongoDB for Administrators

From Training Material
Jump to: navigation, search
Courses Offered
MongoDB Training (US)


Introduction

  • MongoDB is a fast document-oriented database
  • replaces the concept of a "row" with a more flexible model - the "document"
  • convenient data storage for modern object-oriented languages
  • no predefined schemas
  • no transactions
  • no SQL
  • supports indexes
  • easy to scale out horizontally


Getting Started

  • document is a basic unit of data (like row in RDBMS)
  • collection = table
  • multiple databases in single instance
  • _id in every document, unique within collection
  • JavaScript shell for administration and data manipulation


Documents

  • document is an ordered set of keys with associated values
  • representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
  • objects in JavaScript {key:value}
  • example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
  • key is a string, any UTF-8 is allowed (except \0 . $)
  • type-sensitive {"age" : 3}, {"age" : "3"}
  • case-sensitive {"age" : 3}, {"Age" : 3}
  • documents cannot contain duplicated keys
  • key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
    • order does not usually matter, MongoDB can reorder keys


{
  "_id" : ObjectId("545a414c7907b2a255b156c5"),
  "Name" : "Sean Connery",
  "Nationality" : "Great Britain",
  "BirthDate" : ISODate("1930-08-25T00:00:00Z"),
  "BirthYear" : 1930,
  "Occupation" : [
    "Actor",
    "Director",
    "Producer"
  ],
  "Movie" : [
    {
      "_id" : ObjectId("545a5f167907b2a255b156c7"),
      "Title" : "Dr. No"
    },
    {
      "_id" : ObjectId("545a5f317907b2a255b156c8"),
      "Title" : "From Russia with Love"
    },
    {
      "_id" : ObjectId("545a5ed67907b2a255b156c6"),
      "Title" : "Never Say Never Again"
    }
  ],
  "BirthPlace" : {
    "Country" : "United Kingdom, Scotland",
    "City" : "Edinburgh"
  }
}


Collections

  • a group of documents
  • dynamic schemas
    • {"company" : "NobleProg"}
    • {"age" : 5}
  • why should we use more than one collection?
    • nightmare for developers
    • much faster to get a list of collections than extracting document types from collections
    • grouping documents of the same kind
    • indexes
  • collection name is a string, any UTF-8 is allowed except:
    • empty string, start with "system." prefix, contain \0 character, $ character
  • subcollections separated by the . character
    • example GridFS (fs.files, fs.chunks)


Databases

  • database is a group of collections
  • one database = one application
  • separated databases for different applications, users
  • database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
  • database name will end up as file on filesystem (this explains restrictions)
  • special databases: admin (root database), local (never replicated), config (when sharding)
  • namespace is a concatenation of database and collection name (fully qualified collection name)
    • max 121 bytes, shold be less than 100


Getting and Starting MongoDB

  1. Installation on Windows
  2. Installation on Ubuntu


CRUD

Create

use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person)

Read

db.people.find()
db.people.findOne()
db.people.find().pretty()

Update

person.Occupation = "Actor"
db.people.update({"Name" : "Sean Connery"}, person)
db.people.findOne()

Delete

db.people.remove({"Name" : "Sean Connery"})
db.people.remove({})
db.people.findOne()


Data Types

  • JSON-like documents
    • 6 data types: null, boolean, numeric, string, array, object
  • MongoDB adds support for other datatypes
    • null {"x" : null}
    • boolean {"x" : true}
    • number (by default 64-bit floating point numbers) {"x" : 1.4142}
      • 4-byte integers {"x" : NumberInt(141)}
      • 8-byte integers {"x" : LongInt(141)}
    • string (any UTF-8 character) {"x" : "NobleProg"}
    • date (stored as milliseconds from Linux epoch) {"x" : new Date()}
    • regular expressions (in queries) {"x" : /bob/i}
    • array {"x" : [1.4142, true, "training"]}
    • embedded documents {"x" : {"y" : 100}}
    • ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
    • binary data (for non-UTF-8 strings)
    • code {"x" : function() {/*...*/}}


_id and ObjectId

  • every document in MongoDB must have _id
  • can be any type but it defaults to ObjectId
  • unique values in a single collection
  • ObjectId is designed to lightweight and easy to generate
    • 12-bytes of storage (24 hexidecimal digits)
      • timestamp - byte 0-3
      • machine - byte 4-6
      • PID - byte 7-8
      • increment - byte 9-11
    • _id is generated automatically if not present in document


MongoDB Shell

$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>

$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test

$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg


Using help

  • mongo is a JavaScript shell, help is available in JavaScript on-line documentation
  • use built-in help for MongoDB-specific functionality
  • type function name without parentheses to see what the function is doing
> help
    db.help()            help on db methods
    db.mycoll.help()     help on collections methods
    ...
    exit                 quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
    return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>


Running Scripts

  • mongo can execute JavaScript files
  • scripts have access to all global variables (e.g. "db")
  • shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
  • use --quiet to hide "MongoDB shell version..." when executing script
  • use load() to run script directly from Mongo Shell
  • use .mongorc.js for frequently-loaded scripts
    • located in user home directory, run when starting up the shell, --norc to disable it
    • useful also to customize prompt
$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>


Editing Complex Variables

  • limited multiline support in the shell
  • external editors are allowed
  • EDITOR="/usr/bin/gedit"
  • EDITOR="c:\\windows\\notepad.exe"
$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)


Single-server Configuration and Deployment

Configuration File Options


mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf


net:
  port: 27017
  bindIp: 127.0.0.1
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 10
storage:
  dbPath: c:\data\db
  wiredTiger:
    engineConfig:
      cacheSizeGB: 1
systemLog:
  destination: file
  path: c:\data\logs\mongodb.log
security:
  authorization: enabled
  keyFile: c:\data\config\keyfile.txt
replication:
  replSetName: training
  oplogSizeMB: 128


Storage engine

  • A storage engine is the part of a database that is responsible for managing how data is stored on disk.
    • in other words, it's an interface between database and hardware
  • MMAPv1 Storage Engine
    • historically first storage engine
    • collection level locking
    • in-place updates
    • power of 2 sized document allocations
  • WiredTiger Storage Engine
    • new in version 3.0
    • default since 3.2
    • improved performance in most use cases
    • document level locking
    • compression (for data and indexes)
  • In-Memory (experimental)
  • RocksDB, HDFS, FusionIO (under development)


MMAPv1

  • --storageEngine mmapv1
  • locks:
    • multiple readers, single writer lock
    • shared resources: data, metadata (indexes, journal)
  • lock levels:
    • database level locking (2.2 - 2.6)
    • collection level locking (3.0)
  • journal - write-ahead transaction log
    • all operations are written to journal
    • and after that are applied to data files
  • data on disk is raw BSON, directly mapped into virtual memory
  • document allocations
    • no padding
    • padding factor (automatic, manual)
    • power of 2 sized allocations (from 32B to 2MB)


WiredTiger

  • --storageEngine wiredTiger
  • data is stored in B-Trees (similar to B-Trees MMAPv1 uses for indexes)
    • initially documents get written into un-used regions
    • and then merged with the rest of the data in background later
  • WT uses two caches:
    • WT cache (half of RAM by default)
    • operating system cache
  • it uses a write-ahead transaction log in combination with checkpoints to ensure data persistence
  • MongoDB will commit a checkpoint to disk
    • 60 seconds after the end of previous checkpoint or
    • when there is to much dirty data in WT cache (2 gigabytes of data)
  • document level locking
    • WT has no locks but good concurrency protocols
    • writes should scale with the number of threads
  • compression (can be set for each collection separately)
    • snappy (fast, balance storage efficiency and processing requirements)
    • zlib (higher compression rates at the cost of more CPU)
    • off
  • WiredTiger options
  • how to test your own data?


db.createCollection( "email", { storageEngine: { wiredTiger: { configString: 'block_compressor=zlib' }}})


Authentication and Authorization

  • MongoDB employs Role-Based Access Control
  • MongoDB supports
    • password-based authentication (SCRAM-SHA-1, MONGODB-CR)
    • x.509 certificates
    • LDAP proxy (enterprise edition)
    • Kerberos (enterprise edition)
  • authentication is disabled by default
    • to turn it on use --auth switch,
    • create at least one superuser account before
  • user belongs to a database and must be authenticated in that database
  • users created in admin or local database can perform operations on all databases
  • commonly used roles: read, readWrite, dbAdmin, userAdmin, dbOwner
  • http://docs.mongodb.org/manual/core/authorization/


> use admin
> db.createUser({user : "ubuntu", pwd : "NobleProg", roles : ["root"]})
$ mongo -u ubuntu -p --authenticationDatabase admin
$ mongo localhost/admin -u ubuntu -p
> use admin
> db.auth("ubuntu", "NobleProg")
> use test
> db.createUser({user: "testuser", pwd: "NobleProg", roles: ["dbOwner"]})
> db.getSiblingDB("admin").system.users.find().pretty()
> db.getUser("testuser")
> db.getRole("dbOwner")
> db.getRole("dbOwner", {showPrivileges: true})

> db.grantRolesToUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "readWrite", db: "dept"}])
> db.revokeRolesFromUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "dbAdmin", db: "dept"}])

> db.dropUser("testuser")
> db.logout()
//how to create a custom role
> use admin
> db.auth("ubuntu", "NobleProg")
> db.createUser({user: "monitoringuser", pwd: "NobleProg", roles: []})
> db.createRole({role: "statsWatcher", privileges: [{resource: {"anyResource": true}, actions: ["serverStatus"]}], roles: []})
> db.getRole("statsWatcher", {showPrivileges: true})
> db.grantRolesToUser("monitoringuser", [{role: "statsWatcher", db: "admin"}])


Monitoring MongoDB

  • mongotop
  • mongostat
  • MongoDB's Web Console on port +1000
    • enable rest and httpinterface
  • MongoDB Monitoring Service (MMS)
    • plus Munin for monitoring hardware


> db.serverStatus()
> db.runCommand("serverStatus")
> rs.status()
> db.runCommand("replSetGetStatus")
> db.currentOp()
> db.stats()
> db.collection.stats()


Profiler

  • can be turned on database level or for entire server
  • profiling levels:
    • 0 - no profiling
    • 1 - only includes "slow" operations
    • 2 - includes all operations


> db.getProfilingLevel()
> db.setProfilingLevel(1, 100)
> db.system.profile.find().sort({millis : -1}).pretty()


Indexes and Query Optimization

  • database index works like index in a book
  • indexes in MongoDB works almost identical to relational databases
  • because of indexes reads are faster but writes slower
  • 64 indexes per collection


Collection Without Indexes

for (var i=1; i<=1000000; i++) {
  db.visitors.insert({
    "i" : i,
    "visitor" : "visitor_"+i,
    "score" : Math.floor(Math.random()*10+1),
    "date" : new Date()
  })
}


One Field Indexes

db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})


Compound Indexes

db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")


Compound Indexes (2)

  • pattern {SortKey : 1, CriteriaKey : 1}
  • pattern {ExactMatch : 1, RangeCriteria : 1}
  • sorting directions {"score" : -1, "visitor" : 1}
  • {"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
  • covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
  • {key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
  • $-operators and indexes ($where, $exists, $nin, $ne, $not)
  • queries with $or can use more than one index


Indexing Arrays and SubDocuments

  • usually it behaves like normal index
  • difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
  • only one array field per index
  • multikey indexes
db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})


Indexes in Details

  • low and high cardinality fields
  • understanding .explain()
  • using .hint()
  • query optimizer (100 results, 1000 queries, index creation)
  • fields in index must be smaller than 1kB
  • options: unique, dropDups, sparse
db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})


Sparse Index

  • only contain entries for documents that have the indexed field (even if the index field contains a null value)
  • useful with unique constraint
db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})

db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})

db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})


Index Administration

  • system.indexes - read only collection that stores info about indexes
  • .createIndex()
  • .dropIndex(), .dropIndexes()
  • .getIndexes()
  • options: background, name


Capped Collections

  • it has to be created before first insert occurs
  • fixed size or size and number of documents (circular queue)
  • forbidden operations on documents: removing, updating (if it will increase the size)
  • can not be sharded or changed
  • sorting: $natural : 1 (or -1)
db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})

db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})


Tailable Cursors

  • inspired by the tail -f command
  • not closed when their results are exhausted
  • can be used only on capped collection
  • will die after 10 minutes


TTL Indexes (time-to-live)

  • TTL index allows you to set a timeout for each document
  • removing is performed every 60 secons
  • can be created only on single field (date field)
db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()


Full-Text Indexes

  • quick text search with built-in multi-language support
  • very expensive, especially on busy collections
db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})

db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })


Geospatial Indexes

2d index

  • for data stored as points on a two-dimensional plane
db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})

2dsphere index

db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})

db.cities.find({location:{$near: 
    {$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})

db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers

db.cities.find({location: {$geoWithin: 
    {$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})

Replication

  • what will happen when standalone server crashes?
  • replication is a way of keeping identical server copies
  • recommended for all production deployments
  • replication is a set of one primary server and many secondaries
  • replication in MongoDB is asynchronous
  • primary server handles client requests
  • when primary crashes secondaries will elect new primary
  • primary (master), secondary (slave)


Test Setup

  • this is the way to start 3 servers on ports 31000, 31001, 31002
    • or on ports 20000, 20001, 20002 starting from MongoDB 3.2
  • databases are stored in default directory (/data/db)


$ mongo --nodb
> replicaSet = new ReplSetTest({"nodes" : 3})
> replicaSet.startSet()
> replicaSet.initiate()


$ mongo --nodb
> conn0 = new Mongo("localhost:31000")
> db0 = conn0.getDB("test")
> db0.isMaster()
> for (i=0; i<100; i++) { db0.repl.insert({"x" : i}) }
> db0.repl.count()
>
> conn1 = new Mongo("localhost:31001")
> db1 = conn1.getDB("test")
> db1.repl.count()
> db1.setSlaveOk()
> db1.repl.count()
>
> db0.adminCommand({"shutdown" : 1})
> db1.isMaster()


Production-Like Setup

  • this example show how to start replica with 3 members
  • it is on the same server but easily can be changed to multiple machines
  • only one server (localhost:27100) may contain data
  • replica set name can be any utf-8 string (np_rep in this example)
  • use --replSet np_rep when starting mongod
  • each member of the replica set must be able to connect with all members


$ mongod --port 27100 --replSet np_rep --dbpath /data/np_rep/rep0 --logpath /data/np_rep/rep0.log --fork --bind_ip 127.0.0.1 --rest --httpinterface --logappend
$ mongod --port 27101 --replSet np_rep --dbpath /data/np_rep/rep1 --logpath /data/np_rep/rep1.log --fork ...
$ mongod --port 27102 --replSet np_rep --dbpath /data/np_rep/rep2 --logpath /data/np_rep/rep2.log --fork ...
$
$ mongod --shutdown --dbpath /data/np_rep/rep0


  • Mongo Shell is the only way to configure replica set
  • prepare configuration document (np_rep_config)
    • _id key is your replica set name
    • members is a list of all servers in replica set
  • send configuration to server with data
  • this server will change configuration of other members
  • servers will elect primary and start handling clients requests
  • localhost as a server name can be used only for testing


> np_rep_config = { 
...   "_id" : "np_rep",
...   "members" : [
...     {"_id" : 0, "host" : "localhost:27100"},
...     {"_id" : 1, "host" : "localhost:27101"},
...     {"_id" : 2, "host" : "localhost:27102"}
...   ]
... }
>
> rs.initiate(np_rep_config)
> db = (new Mongo("localhost:27102")).getDB("test")


rs Helper

  • rs is a global variable containing replication helper functions
  • rs.help() - list of all functions
  • rs.initiate(np_rep_config) is a wrapper to adminCommand
    • db.adminCommand({"replSetInitiate" : np_rep_config})
> rs.help()
> rs.add("localhost:27103")
> rs.remove("localhost:27103")
> rs.config()
> np_rep_config = rs.config()
> edit np_rep_config 
> rs.reconfig(np_rep_config)


All About Majorities

  • how to design replica set?
  • majority = more than half of all members in the set
  • majority is based on configuration not on the current status of replica
    • i.e. 5 members replica, 3 members are down
    • two remaining members can not reach majority
    • if one of two was primary it would step down
    • all two members will be secondaries
  • why two of five can not reach majority?
  • examples: 3+2 members, 2+2+1 members, only 2 members


How Election Works

  • election starts when secondary can not reach primary
  • it will contact all other members and request that it be primary
  • others do several checks:
    • can they reach primary?
    • is candidate up to date with replication?
    • is there any member with higher priority?
  • election ends when candidate receives "yes" from a majority
  • members can send veto (10000 votes)
  • heartbeat is sent every 2 seconds with 10 seconds time-out to all members
    • based on this, the primary knows if it can reach a majority
    • when election results in a tie all members will wait 30 seconds


Member Configuration

  • arbiter
    • normal mongod process started with --replSet option and empty data directory
    • > rs.addArb("server:port")
    • > rs.add({"_id" : 7, "host" : "server:port", "arbiterOnly" : true})
    • use at most one arbiter, don't add arbiters "just in case"
    • arbiter once added can't be changed to normal mongod, and vice versa
    • use normal data member instead of an arbiter whenever possible

  • priority
    • priority range is from 0 to 100 (default 1)
    • it means how badly this member wants to become primary
    • member with priority 0 can never become primary (passive members)

  • hidden
    • hidden members don't handle client requests
    • hidden members are not preferred as replication sources
    • useful for backup or less powerful machines
    • use hidden : true with priority : 0
    • rs.isMaster(), rs.status(), rs.config()

  • slave delay
    • delayed secondary will purposely lag by the specified number of second
    • use slaveDelay : seconds with priority : 0
    • ... and hidden : true if your clients reads from secondaries

  • building indexes
    • useful for backup servers
    • prevents from building any indexes
    • use buildIndexes : false with priority : 0
    • non-index-building member can't be changed to normal easily


Sync Process

  • oplog contains all operations that primary performs
  • capped collection in local database on each member
  • any member can be used as a source for replication
  • steps: read data from primary -> apply operation to data -> write to local oplog
  • re applying the same oplog operation is safe and handled properly (the same result)
  • oplog is fixed in size not in time
    • usually one operation on data results in one operation in oplog
    • exception: one operation that affects multiple documents is stored in oplog as many operations on single document
  • secondary may go stale when: secondary had downtime, has too many writes than it can handle, to busy handling reads


Initial Sync Process

  1. initial checks (choose a source, drop existing databases)
  2. cloning data from source (longest operation)
  3. first oplog application
  4. second oplog application
  5. index building (long operation)
  6. third oplog application
  7. switching to normal syncing and normal operations
  • restoring from backup is often faster
  • cloning may ruin source's working set
  • initial sync may fail if oplog is too short


States of Replica Set Members

  1. Primary
  2. Secondary
  3. Startup (when you start a member for the first time)
  4. Startup2 (initial sync, short state on normal members: starts replication and election process)
  5. Recovering (failure state, member is operating correctly but not available for reads, occurs in many situations)
  6. Arbiter
  7. Down (when member was up and then becomes unreachable)
  8. Unknown (when member has never been able to reach another member)
  9. Removed (after removing from replica set, when added back it will return into "normal" state)
  10. Rollback (when rolling back data)
  11. Fatal (find a reason in log, grep "replSet FATAL", restore from backup or resync)


Rollbacks

  1. what is rollback and when it will be performed
  2. synchronization must be done manually
    • collectionName.bson file in a rollback directory in data directory
    • mongorestore above file into temporary collection and perform manual merge
  3. rollback will fail if there is more than 300MB of data or about 30 minutes of operations to rollback
  4. how to prevent rollbacks
    • do not change number of votes of replica set members
    • keep secondaries up to date


Connection to Replica Set and Replication Guarantees

  • connection strings: mongodb://user_name:password@host1:27017,host2:27018,host3:27019/?replicaSet=replicaName&connectTimeoutMS=10000&authMechanism=SCRAM-SHA-1


> db.runCommand({getLastError: 1, w: "majority"})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority"}})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority", wtimeout: 5000}})
> db.testColl.insert({a: 1}, {writeConcern: {w: 3, wtimeout: 5000}})


> config = rs.conf()
> config.settings = {}
> config.settings.getLastErrorDefaults = {w: "majority", wtimeout: 5000}
> rs.reconfig(config)


> config = rs.config()
> config.members[0].tags = {"dc": "PL"}
> config.members[1].tags = {"dc": "UK"}
> config.settings = config.settings || {}
> config.settings.getLastErrorModes = [{"allDataCenters" : {"dc": 2}}]
> rs.reconfig(config)
> db.testColl.insert({a: 1}, {writeConcern: {w: "allDataCenters", wtimeout: 5000}})


Read Preference

  • read preference describes how MongoDB clients route read operations to the members of a replica set
  • read preference modes:
    • primary - default mode
    • primaryPreferred
    • secondary
    • secondaryPreferred
    • nearest
  • tag sets


Administration of Replica Set


Run as Standalone

  • many operation can not be done on secondaries
    1. stop the secondary
    2. run mongod without --replSet, on different port, with the same --dbpath
$ mongod --port 27999 --dbpath /var/lib/mongodb


Large Replica Sets

  • replica sets are limited to 50 members (12 before version 3.0) and only 7 voting members
  • this is limited to reduce network traffic generated by heartbeat
  • use Master-Slave configuration when more than 50 secondaries is needed
> rs.add({"_id" : 51, "host" : "server-8:27017", "votes" : 0})


Forcing Reconfiguration

  • useful when you majority is lost permanantly
  • force allows you to send the configuration to secondary (not only to primary)
  • configuration needs to be prepared correctly
  • force will change version dramatically
> var config = rs.config()
> edit config
> rs.reconfig(config, {"force" : true})


Changing Member Status Manually

  • there is no way to force member to become primary
  • demote primary to a secondary (by default 60 seconds)
> rs.stepDown()
> rs.stepDown(600)


Preventing Elections

  • in can be also used on demoted primary to unfreeze it
> rs.freeze(600)
> rs.freeze(0)


Maintenance Mode

  • server will go into recovery state
  • useful if server is performing long operations or is far behind in replication
> db.adminCommand({"replSetMaintenanceMode" : true})
> db.adminCommand({"replSetMaintenanceMode" : false})


Monitoring Replication

  • status of the replica set (from the current server perspective)
  • db.adminCommand("replSetGetStatus") or rs.status()
  • important fields: self, stateStr, uptime [s], optimeDate (last oplog operation), pingMs, errmsg
> rs.status()


Replication Source

  • rs.status() can be used to create the replication graph
  • replication source is determined by ping (smallest)
  • it can cause chains creation in replication
  • replication loops: s1 from s2, s2 from s3, s3 from s1
> db.adminCommand({"replSetSyncFrom" : server:27017})
> var config = rs.config()
> config.settings = config.settings || {}
> config.settings.chainingAllowed = false
> rs.reconfig(config)


Resizing the Oplog

  • long oplog gives you time for maintenance
  • oplog on primary should be at least few days or even weeks long
> db.printReplicationInfo()
> db.printSlaveReplicationInfo()


procedure to resize the oplog:

  • demote primary into secondary
  • shut down and restart as standalone
  • copy the last insert from the oplog into temporary collection
> use local
> var cursor = db.oplog.rs.find({"op" : "i"})
> var lastInsert = cursor.sort({"$natural" : -1}).limit(1).next()
> db.tmpLastOp.save(lastInsert)
> db.tmpLastOp.findOne()
  • drop current oplog
> db.oplog.rs.drop()
  • create new oplog with different size
> db.createCollection("oplog.rs", {"capped" : true, "size" : 10000000})
  • move last operation from temporary collection into new oplog
> var lastInsert = db.tmpLastOp.findOne()
> db.oplog.rs.insert(lastInsert)
> db.oplog.rs.findOne()
  • shut down standalone server and restart as replica set member


Other operations

  • restoring data from delayed secondary
  • building indexes
  • replication on cheap machines: priority:0, hidden:true, buildIndexes:false, votes:0
  • master - slave configuration and mimicking this behaviour when using replica set
  • calculating lag: local.slaves, local.me, local.slaves.drop()


Backup and Restore

  • backups generally should be done on secondaries
  • or on standalone servers at an off time


Method #1: Filesystem Snapshot

  • the simplest way to make backup
  • filesystem must support snapshotting
  • mongod must run with journaling enabled
  • stop mongod before restoring


# Create the snapshot volume
lvcreate -L128M -s -n dbbackup /dev/ops/databases

# Mount the snapshot volume
mkdir /mnt/ops/dbbackup
mount /dev/ops/dbbackup /mnt/ops/dbbackup

# Do the backup
tar -cf /dev/rmt0 /mnt/ops/dbbackup

# Remove the snapshot
umount /mnt/ops/dbbackup
lvremove /dev/ops/dbbackup


Method #2: Copying Data Files

  • use fsyncLock() before copying files
    • fsycnLock() prevents all databases against any further write
    • saves all dirty data to disk
    • queues all write operations
  • now files are consistent and copying of all files is possible
  • fsyncUnlock() releases the lock and brings back database to normal operations
  • if using authentication do not log out from the console between fsyncLock() and fsyncUnlock()
  • stop mongod before restoring
    • do not restore single database if crash or hard shutdown occurs


> db.fsyncLock()
> db.fsyncUnlock()


Method #3: Using mongodump and mongorestore

  • mongorestore is slower than previous methods and has other downsides
  • good way to backup individual databases or collections
  • it will create a dump directory (with subdirectories for each database) in current directory
  • data is stored in .bson files
  • mongodump can be used when mongod is not running
    • do not use --dbpath when mongod is running
  • when mongodump is running writes are allowed so already backuped data may change before mongodump will finish
    • do not use fsyncLock() when using mongodump
    • use --oplog if you are running mongod with --replSet
  • mongodump will choose secondary when connected to replica set
  • use mongodump and mongorestore in the same version
  • try to avoid this type of backup if you have unique indexes other than _id


mongodump --help
mongodump --port 27017
mongodump --dbpath /var/lib/mongodb
mongorestore --port 27017 --drop
mongodump --port 27017 --oplog
mongorestore --port 27017 --oplogReplay dump/
mongorestore -d dstDB -c dstCollection dump/srcDB/scrColl.bson


Backup of Replica Set

  • all previous methods are OK but 1st and 2nd are recommended (without any modifications)
  • when using mongodump use --oplog
  • when restoring with mongorestore:
    1. start a server as a standalone and
    2. restore the data with --oplogReplay and
    3. restore the oplog collection
    4. restart the server as a member of replica


> use local
> db.createCollection("oplog.rs", {"capped" : true, "size" : 1000000})
$ mongorestore -d local -c oplog.rs dump/oplog.bson


Backup of Sharded Cluster

  • usually it can not be done perfectly
  • instead of backing up everything at once, backup servers separately
  • turn off balancer before making backup
  • run mongodump through mongos to backup entire cluster
  • problems with restoring single shard


Other Administrative Tasks

  • Preheating data
    • loading everything into memory
    • loading selected collections only
> db.runCommand({ touch: "collectionName", data: true, index: true })
    • loading specific indexes
    • loading recently created documents
    • replay application usage


  • Compacting
> db.runCommand({"compact" : "collectionName", "paddingFactor" : 1.5})


  • Repairing data
    • use --repair and optionally --repairpath
> db.repairDatabase()