MongoDB for Administrators
Jump to navigation
Jump to search
Copyright Notice
Copyright © 2004-2023 by NobleProg Limited All rights reserved.
This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
Introduction
- MongoDB is a fast document-oriented database
- replaces the concept of a "row" with a more flexible model - the "document"
- convenient data storage for modern object-oriented languages
- no predefined schemas
- no transactions
- no SQL
- supports indexes
- easy to scale out horizontally
Getting Started
- document is a basic unit of data (like row in RDBMS)
- collection = table
- multiple databases in single instance
- _id in every document, unique within collection
- JavaScript shell for administration and data manipulation
Documents
- document is an ordered set of keys with associated values
- representation of a document varies by programming language (map; hash - Perl, Ruby; dictionary - Python)
- objects in JavaScript {key:value}
- example {"company" : "NobleProg", "training" : "MongoDB for Developers"}
- key is a string, any UTF-8 is allowed (except \0 . $)
- type-sensitive {"age" : 3}, {"age" : "3"}
- case-sensitive {"age" : 3}, {"Age" : 3}
- documents cannot contain duplicated keys
- key/value pairs are ordered {"x" : 1, "y" : 1} != {"y" : 1, "x" : 1}
- order does not usually matter, MongoDB can reorder keys
{
"_id" : ObjectId("545a414c7907b2a255b156c5"),
"Name" : "Sean Connery",
"Nationality" : "Great Britain",
"BirthDate" : ISODate("1930-08-25T00:00:00Z"),
"BirthYear" : 1930,
"Occupation" : [
"Actor",
"Director",
"Producer"
],
"Movie" : [
{
"_id" : ObjectId("545a5f167907b2a255b156c7"),
"Title" : "Dr. No"
},
{
"_id" : ObjectId("545a5f317907b2a255b156c8"),
"Title" : "From Russia with Love"
},
{
"_id" : ObjectId("545a5ed67907b2a255b156c6"),
"Title" : "Never Say Never Again"
}
],
"BirthPlace" : {
"Country" : "United Kingdom, Scotland",
"City" : "Edinburgh"
}
}
Collections
- a group of documents
- dynamic schemas
- {"company" : "NobleProg"}
- {"age" : 5}
- why should we use more than one collection?
- nightmare for developers
- much faster to get a list of collections than extracting document types from collections
- grouping documents of the same kind
- indexes
- collection name is a string, any UTF-8 is allowed except:
- empty string, start with "system." prefix, contain \0 character, $ character
- subcollections separated by the . character
- example GridFS (fs.files, fs.chunks)
Databases
- database is a group of collections
- one database = one application
- separated databases for different applications, users
- database name is a alphanumeric string, case sensitive, max 64 bytes, empty string is not allowed
- database name will end up as file on filesystem (this explains restrictions)
- special databases: admin (root database), local (never replicated), config (when sharding)
- namespace is a concatenation of database and collection name (fully qualified collection name)
- max 121 bytes, shold be less than 100
Getting and Starting MongoDB
- Installation on Windows
- Installation on Ubuntu
CRUD
Create
use NobleProg
person = {"Name" : "Sean Connery", "Nationality" : "Great Britain"}
db.people.insert(person) // .insert() is depricated now, use .insertOne() instead
Read
db.people.find()
db.people.findOne()
db.people.find().pretty() // .pretty() is the default now, so can be omitted
Update
personUp = {"Occupation" : "Actor"}
db.people.update({"Name" : "Sean Connery"}, {$set: personUp}) // .update() is depricated now, use .updateOne() instead
db.people.findOne()
Delete
db.people.remove({"Name" : "Sean Connery"}) // .remove() is depricated now, use .deleteOne() instead
db.people.remove({})
db.people.findOne()
Data Types
- JSON-like documents
- 6 data types: null, boolean, numeric, string, array, object
- MongoDB adds support for other datatypes
- null {"x" : null}
- boolean {"x" : true}
- number (by default 64-bit floating point numbers) {"x" : 1.4142}
- 4-byte integers {"x" : NumberInt(141)}
- 8-byte integers {"x" : LongInt(141)}
- string (any UTF-8 character) {"x" : "NobleProg"}
- date (stored as milliseconds from Linux epoch) {"x" : new Date()}
- regular expressions (in queries) {"x" : /bob/i}
- array {"x" : [1.4142, true, "training"]}
- embedded documents {"x" : {"y" : 100}}
- ObjectId {"x" : ObjectId("54597591bb107f6ef5989771")}
- binary data (for non-UTF-8 strings)
- code {"x" : function() {/*...*/}}
_id and ObjectId
- every document in MongoDB must have _id
- can be any type but it defaults to ObjectId
- unique values in a single collection
- ObjectId is designed to lightweight and easy to generate
- 12-bytes of storage (24 hexidecimal digits)
- timestamp - byte 0-3
- machine - byte 4-6
- PID - byte 7-8
- increment - byte 9-11
- _id is generated automatically if not present in document
- 12-bytes of storage (24 hexidecimal digits)
MongoDB Shell
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
>
$ mongo HostName:PortNumber/DatabaseName
$ mongo localhost:27017/test
$ mongo --nodb
> conn = new Mongo("localhost:27017")
connection to localhost27017
> db = conn.getDB("NobleProg")
NobleProg
Using help
- mongo is a JavaScript shell, help is available in JavaScript on-line documentation
- use built-in help for MongoDB-specific functionality
- type function name without parentheses to see what the function is doing
> help
db.help() help on db methods
db.mycoll.help() help on collections methods
...
exit quit mongo shell
>
> db.NobleProg.stats
function ( scale ){
return this._db.runCommand( { collstats : this._shortName , scale : scale } );
}
>
> db.NobleProg.stats()
{ "ok" : 0, "errmsg" : "Collection [test.NobleProg] not found." }
>
Running Scripts
- mongo can execute JavaScript files
- scripts have access to all global variables (e.g. "db")
- shell helpers (e.g. "show collections") do not work from files; use valid JavaScript equivalents (e.g. "db.getCollectionNames()")
- use --quiet to hide "MongoDB shell version..." when executing script
- use load() to run script directly from Mongo Shell
- use .mongorc.js for frequently-loaded scripts
- located in user home directory, run when starting up the shell, --norc to disable it
- useful also to customize prompt
$ mongo script.js
MongoDB shell version: 4.0.6
connecting to: test
script.js was executed successfully!
$
$ mongo --quiet script.js
script.js was executed successfully!
$
$ mongo
MongoDB shell version: 4.0.6
connecting to: test
> load("script.js")
script.js was executed successfully!
>
Editing Complex Variables
- limited multiline support in the shell
- external editors are allowed
- EDITOR="/usr/bin/gedit"
- EDITOR="c:\\windows\\notepad.exe"
$ mongo --quiet
> EDITOR="c:\\windows\\notepad.exe"
> use training
switched to db training
> person = db.people.findOne()
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery" }
> edit person
> person
{ "_id" : ObjectId("54568445cfc7c83518fa5430"), "Name" : "Sean Connery", "Nationality" : "Great Britain" }
> db.people.save(person)
Single-server Configuration and Deployment
Configuration File Options
- YAML-based configuration file format (since 2.6)
mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf
net:
port: 27017
bindIp: 127.0.0.1
operationProfiling:
mode: slowOp
slowOpThresholdMs: 10
storage:
dbPath: c:\data\db
wiredTiger:
engineConfig:
cacheSizeGB: 1
systemLog:
destination: file
path: c:\data\logs\mongodb.log
security:
authorization: enabled
keyFile: c:\data\config\keyfile.txt
replication:
replSetName: training
oplogSizeMB: 128
Storage engine
- A storage engine is the part of a database that is responsible for managing how data is stored on disk.
- in other words, it's an interface between database and hardware
- MMAPv1 Storage Engine
- historically first storage engine
- collection level locking
- in-place updates
- power of 2 sized document allocations
- WiredTiger Storage Engine
- new in version 3.0
- default since 3.2
- improved performance in most use cases
- document level locking
- compression (for data and indexes)
- In-Memory (experimental)
- RocksDB, HDFS, FusionIO (under development)
MMAPv1
- --storageEngine mmapv1
- locks:
- multiple readers, single writer lock
- shared resources: data, metadata (indexes, journal)
- lock levels:
- database level locking (2.2 - 2.6)
- collection level locking (3.0)
- journal - write-ahead transaction log
- all operations are written to journal
- and after that are applied to data files
- data on disk is raw BSON, directly mapped into virtual memory
- document allocations
- no padding
- padding factor (automatic, manual)
- power of 2 sized allocations (from 32B to 2MB)
WiredTiger
- --storageEngine wiredTiger
- data is stored in B-Trees (similar to B-Trees MMAPv1 uses for indexes)
- initially documents get written into un-used regions
- and then merged with the rest of the data in background later
- WT uses two caches:
- WT cache (half of RAM by default)
- operating system cache
- it uses a write-ahead transaction log in combination with checkpoints to ensure data persistence
- MongoDB will commit a checkpoint to disk
- 60 seconds after the end of previous checkpoint or
- when there is to much dirty data in WT cache (2 gigabytes of data)
- document level locking
- WT has no locks but good concurrency protocols
- writes should scale with the number of threads
- compression (can be set for each collection separately)
- snappy (fast, balance storage efficiency and processing requirements)
- zlib (higher compression rates at the cost of more CPU)
- off
- WiredTiger options
- how to test your own data?
db.createCollection( "email", { storageEngine: { wiredTiger: { configString: 'block_compressor=zlib' }}})
Authentication and Authorization
- MongoDB employs Role-Based Access Control
- MongoDB supports
- password-based authentication (SCRAM-SHA-1, MONGODB-CR)
- x.509 certificates
- LDAP proxy (enterprise edition)
- Kerberos (enterprise edition)
- authentication is disabled by default
- to turn it on use --auth switch,
- create at least one superuser account before
- user belongs to a database and must be authenticated in that database
- users created in admin or local database can perform operations on all databases
- commonly used roles: read, readWrite, dbAdmin, userAdmin, dbOwner
- http://docs.mongodb.org/manual/core/authorization/
> use admin
> db.createUser({user : "ubuntu", pwd : "NobleProg", roles : ["root"]})
$ mongo -u ubuntu -p --authenticationDatabase admin
$ mongo localhost/admin -u ubuntu -p
> use admin
> db.auth("ubuntu", "NobleProg")
> use test
> db.createUser({user: "testuser", pwd: "NobleProg", roles: ["dbOwner"]})
> db.getSiblingDB("admin").system.users.find().pretty()
> db.getUser("testuser")
> db.getRole("dbOwner")
> db.getRole("dbOwner", {showPrivileges: true})
> db.grantRolesToUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "readWrite", db: "dept"}])
> db.revokeRolesFromUser("testuser", [{role: "read", db: "dept"}])
> db.grantRolesToUser("testuser", [{role: "dbAdmin", db: "dept"}])
> db.dropUser("testuser")
> db.logout()
//how to create a custom role
> use admin
> db.auth("ubuntu", "NobleProg")
> db.createUser({user: "monitoringuser", pwd: "NobleProg", roles: []})
> db.createRole({role: "statsWatcher", privileges: [{resource: {"anyResource": true}, actions: ["serverStatus"]}], roles: []})
> db.getRole("statsWatcher", {showPrivileges: true})
> db.grantRolesToUser("monitoringuser", [{role: "statsWatcher", db: "admin"}])
Monitoring MongoDB
- mongotop
- mongostat
- MongoDB's Web Console on port +1000
- enable rest and httpinterface
- MongoDB Monitoring Service (MMS)
- plus Munin for monitoring hardware
> db.serverStatus()
> db.runCommand("serverStatus")
> rs.status()
> db.runCommand("replSetGetStatus")
> db.currentOp()
> db.stats()
> db.collection.stats()
Profiler
- can be turned on database level or for entire server
- profiling levels:
- 0 - no profiling
- 1 - only includes "slow" operations
- 2 - includes all operations
> db.getProfilingLevel()
> db.setProfilingLevel(1, 100)
> db.system.profile.find().sort({millis : -1}).pretty()
Indexes and Query Optimization
- database index works like index in a book
- indexes in MongoDB works almost identical to relational databases
- because of indexes reads are faster but writes slower
- 64 indexes per collection
Collection Without Indexes
for (var i=1; i<=1000000; i++) {
db.visitors.insert({
"i" : i,
"visitor" : "visitor_"+i,
"score" : Math.floor(Math.random()*10+1),
"date" : new Date()
})
}
One Field Indexes
db.visitors.explain().find({"visitor" : "visitor_330"})
db.visitors.explain().find({"visitor" : "visitor_330"}).limit(1)
db.visitors.explain().find({"visitor" : "visitor_99999"}).limit(1)
db.visitors.createIndex({"visitor" : 1})
db.visitors.explain().find({"visitor" : "visitor_99999"})
Compound Indexes
db.visitors.find().sort({"score" : 1, "visitor" : 1})
db.visitors.createIndex({"visitor" : 1})
db.visitors.createIndex({"score" : 1, "visitor" : 1})
db.visitors.explain(true).find({"score" : 10}).sort({"visitor" : -1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1})
db.visitors.createIndex({"visitor" : 1, "score" : 1})
db.visitors.explain(true).find({"score" : {"$gte" : 10, "$lte" : 20}}).sort({"visitor" : 1}).hint({"visitor" : 1, "score" : 1})
db.visitors.dropIndex("visitor_1")
Compound Indexes (2)
- pattern {SortKey : 1, CriteriaKey : 1}
- pattern {ExactMatch : 1, RangeCriteria : 1}
- sorting directions {"score" : -1, "visitor" : 1}
- {"score" : -1, "visitor" : 1} = {"score" : 1, "visitor" : -1}
- covered indexes (indexOnly : true or totalDocsExamined = 0 when totalKeysExamined > 0)
- {key1 : 1, key2 : 1, key3 : 1} eliminates the need of creating {key1 : 1} and {key1 : 1, key2 : 1}
- $-operators and indexes ($where, $exists, $nin, $ne, $not)
- queries with $or can use more than one index
Indexing Arrays and SubDocuments
- usually it behaves like normal index
- difference between indexing {"BirthPlace" : 1} and {"BirthPlace.Country" : 1}
- only one array field per index
- multikey indexes
db.people.createIndex({"Movie.Title" : 1})
db.people.createIndex({"BirthPlace.Country" : 1})
Indexes in Details
- low and high cardinality fields
- understanding .explain()
- using .hint()
- query optimizer (100 results, 1000 queries, index creation)
- fields in index must be smaller than 1kB
- options: unique, dropDups, sparse
db.visitors.createIndex({"visitor" : 1}, {"unique" : 1, "dropDups" : 1})
Sparse Index
- only contain entries for documents that have the indexed field (even if the index field contains a null value)
- useful with unique constraint
db.sparse_index.insert([{y:1, x:1}, {y:1, x:2}, {y:1, x:3}, {y:1}])
db.sparse_index.createIndex({x:1}, {unique:1})
db.sparse_index.insert({y:1})
db.sparse_index.dropIndexes()
db.sparse_index.createIndex({x:1}, {unique:1, sparse:1})
db.sparse_index.insert({y:1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"x" : 1})
db.sparse_index.find({"x" : {"$ne" : 2}}).hint({"$natural" : 1})
Index Administration
- system.indexes - read only collection that stores info about indexes
- .createIndex()
- .dropIndex(), .dropIndexes()
- .getIndexes()
- options: background, name
Capped Collections
- it has to be created before first insert occurs
- fixed size or size and number of documents (circular queue)
- forbidden operations on documents: removing, updating (if it will increase the size)
- can not be sharded or changed
- sorting: $natural : 1 (or -1)
db.createCollection("capped_collection", {"capped" : true, "size" : 100000})
db.createCollection("capped_collection", {"capped" : true, "size" : 100000, "max" : 20})
db.people.copyTo("capped_collection")
db.runCommand({"convertToCapped" : "capped_collection", "size" : 100000})
Tailable Cursors
- inspired by the tail -f command
- not closed when their results are exhausted
- can be used only on capped collection
- will die after 10 minutes
TTL Indexes (time-to-live)
- TTL index allows you to set a timeout for each document
- removing is performed every 60 secons
- can be created only on single field (date field)
db.ttl_collection.insert({"User" : "user1", "LastUpdated" : new Date()})
db.ttl_collection.createIndex({"LastUpdated" : 1}, {"expireAfterSeconds" : 30})
db.ttl_collection.find()
Full-Text Indexes
- quick text search with built-in multi-language support
- very expensive, especially on busy collections
db.people.createIndex({"Name" : "text"})
db.people.createIndex({"Name" : "text", "Bio" : "text"}, {"weights" : {"Name": 2}})
db.people.createIndex({"$**" : "text"})
db.people.createIndex({"whatever" : "text"}, {"weights" : {"Name" : 5, "Movie.Name" : 2, "$**" : 1}})
db.runCommand({"text" : "people", "search" : "emma thompson"})
db.people.find({$text : {$search : "emma thompson"}})
db.people.find({$text : {$search : "\"emma thompson\""}})
db.people.find({$text : {$search : "-emma thompson"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}})
db.people.find({$text : {$search : "emma thompson"}}, {score : {$meta : "textScore"}}).sort({ score: { $meta: "textScore" } })
Geospatial Indexes
2d index
- for data stored as points on a two-dimensional plane
db.dots.insert([{Name:"A", location:[10, 5]}, {Name:"B", location:[17, -5]}, {Name:"C", location:[0, 2]}, {Name:"D", location:[-3, -3]}])
db.dots.createIndex({location:"2d", type:1})
db.dots.find({location:{$near:[0,0]}})
2dsphere index
- supports queries that calculate geometries on an earth-like sphere
- data stored as GeoJSON objects
- operators: http://docs.mongodb.org/master/reference/operator/query-geospatial/
- calculating distance: http://docs.mongodb.org/master/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/
db.cities.insert({Name:"Rzeszów", "location": {"type":"Point", "coordinates":[22.008606,50.040264]}})
db.cities.insert({Name:"Warszawa", "location": {"type":"Point", "coordinates":[21.0123237,52.2328474]}})
db.cities.insert({Name:"Wrocław", "location": {"type":"Point", "coordinates":[17.0342894,51.1174725]}})
db.cities.insert({Name:"Kraków", "location": {"type":"Point", "coordinates":[19.9012826,50.0719423]}})
db.cities.insert({Name:"Kielce", "location": {"type":"Point", "coordinates":[20.6156414,50.85404]}})
db.cities.createIndex({location:"2dsphere"})
db.cities.find({location:{$near:
{$geometry: {type:"Point", coordinates:[20.6156414, 50.85404]}, $minDistance: 0, $maxDistance:200000}
}})
db.runCommand({geoNear: "cities", near: [20.6156414, 50.85404], spherical: true, distanceMultiplier: 6378.1}) // distance in kilometers
db.cities.find({location: {$geoWithin:
{$geometry: {type: "Polygon", coordinates: [[ [22.008606,50.040264], [21.0123237,52.2328474], [19.9012826,50.0719423], [22.008606,50.040264] ]]}}
}})
Replication
- what will happen when standalone server crashes?
- replication is a way of keeping identical server copies
- recommended for all production deployments
- replication is a set of one primary server and many secondaries
- replication in MongoDB is asynchronous
- primary server handles client requests
- when primary crashes secondaries will elect new primary
- primary (master), secondary (slave)
Test Setup
- this is the way to start 3 servers on ports 31000, 31001, 31002
- or on ports 20000, 20001, 20002 starting from MongoDB 3.2
- databases are stored in default directory (/data/db)
$ mongo --nodb
> replicaSet = new ReplSetTest({"nodes" : 3})
> replicaSet.startSet()
> replicaSet.initiate()
$ mongo --nodb
> conn0 = new Mongo("localhost:31000")
> db0 = conn0.getDB("test")
> db0.isMaster()
> for (i=0; i<100; i++) { db0.repl.insert({"x" : i}) }
> db0.repl.count()
>
> conn1 = new Mongo("localhost:31001")
> db1 = conn1.getDB("test")
> db1.repl.count()
> db1.setSlaveOk()
> db1.repl.count()
>
> db0.adminCommand({"shutdown" : 1})
> db1.isMaster()
Production-Like Setup
- this example show how to start replica with 3 members
- it is on the same server but easily can be changed to multiple machines
- only one server (localhost:27100) may contain data
- replica set name can be any utf-8 string (np_rep in this example)
- use --replSet np_rep when starting mongod
- each member of the replica set must be able to connect with all members
$ mongod --port 27100 --replSet np_rep --dbpath /data/np_rep/rep0 --logpath /data/np_rep/rep0.log --fork --bind_ip 127.0.0.1 --rest --httpinterface --logappend
$ mongod --port 27101 --replSet np_rep --dbpath /data/np_rep/rep1 --logpath /data/np_rep/rep1.log --fork ...
$ mongod --port 27102 --replSet np_rep --dbpath /data/np_rep/rep2 --logpath /data/np_rep/rep2.log --fork ...
$
$ mongod --shutdown --dbpath /data/np_rep/rep0
- Mongo Shell is the only way to configure replica set
- prepare configuration document (np_rep_config)
- _id key is your replica set name
- members is a list of all servers in replica set
- send configuration to server with data
- this server will change configuration of other members
- servers will elect primary and start handling clients requests
- localhost as a server name can be used only for testing
> np_rep_config = {
... "_id" : "np_rep",
... "members" : [
... {"_id" : 0, "host" : "localhost:27100"},
... {"_id" : 1, "host" : "localhost:27101"},
... {"_id" : 2, "host" : "localhost:27102"}
... ]
... }
>
> rs.initiate(np_rep_config)
> db = (new Mongo("localhost:27102")).getDB("test")
rs Helper
- rs is a global variable containing replication helper functions
- rs.help() - list of all functions
- rs.initiate(np_rep_config) is a wrapper to adminCommand
- db.adminCommand({"replSetInitiate" : np_rep_config})
> rs.help()
> rs.add("localhost:27103")
> rs.remove("localhost:27103")
> rs.config()
> np_rep_config = rs.config()
> edit np_rep_config
> rs.reconfig(np_rep_config)
All About Majorities
- how to design replica set?
- majority = more than half of all members in the set
- majority is based on configuration not on the current status of replica
- i.e. 5 members replica, 3 members are down
- two remaining members can not reach majority
- if one of two was primary it would step down
- all two members will be secondaries
- why two of five can not reach majority?
- examples: 3+2 members, 2+2+1 members, only 2 members
How Election Works
- election starts when secondary can not reach primary
- it will contact all other members and request that it be primary
- others do several checks:
- can they reach primary?
- is candidate up to date with replication?
- is there any member with higher priority?
- election ends when candidate receives "yes" from a majority
- members can send veto (10000 votes)
- heartbeat is sent every 2 seconds with 10 seconds time-out to all members
- based on this, the primary knows if it can reach a majority
- when election results in a tie all members will wait 30 seconds
Member Configuration
- arbiter
- normal mongod process started with --replSet option and empty data directory
- > rs.addArb("server:port")
- > rs.add({"_id" : 7, "host" : "server:port", "arbiterOnly" : true})
- use at most one arbiter, don't add arbiters "just in case"
- arbiter once added can't be changed to normal mongod, and vice versa
- use normal data member instead of an arbiter whenever possible
- priority
- priority range is from 0 to 100 (default 1)
- it means how badly this member wants to become primary
- member with priority 0 can never become primary (passive members)
- hidden
- hidden members don't handle client requests
- hidden members are not preferred as replication sources
- useful for backup or less powerful machines
- use hidden : true with priority : 0
- rs.isMaster(), rs.status(), rs.config()
- slave delay
- delayed secondary will purposely lag by the specified number of second
- use slaveDelay : seconds with priority : 0
- ... and hidden : true if your clients reads from secondaries
- building indexes
- useful for backup servers
- prevents from building any indexes
- use buildIndexes : false with priority : 0
- non-index-building member can't be changed to normal easily
Sync Process
- oplog contains all operations that primary performs
- capped collection in local database on each member
- any member can be used as a source for replication
- steps: read data from primary -> apply operation to data -> write to local oplog
- re applying the same oplog operation is safe and handled properly (the same result)
- oplog is fixed in size not in time
- usually one operation on data results in one operation in oplog
- exception: one operation that affects multiple documents is stored in oplog as many operations on single document
- secondary may go stale when: secondary had downtime, has too many writes than it can handle, to busy handling reads
Initial Sync Process
- initial checks (choose a source, drop existing databases)
- cloning data from source (longest operation)
- first oplog application
- second oplog application
- index building (long operation)
- third oplog application
- switching to normal syncing and normal operations
- restoring from backup is often faster
- cloning may ruin source's working set
- initial sync may fail if oplog is too short
States of Replica Set Members
- Primary
- Secondary
- Startup (when you start a member for the first time)
- Startup2 (initial sync, short state on normal members: starts replication and election process)
- Recovering (failure state, member is operating correctly but not available for reads, occurs in many situations)
- Arbiter
- Down (when member was up and then becomes unreachable)
- Unknown (when member has never been able to reach another member)
- Removed (after removing from replica set, when added back it will return into "normal" state)
- Rollback (when rolling back data)
- Fatal (find a reason in log, grep "replSet FATAL", restore from backup or resync)
Rollbacks
- what is rollback and when it will be performed
- synchronization must be done manually
- collectionName.bson file in a rollback directory in data directory
- mongorestore above file into temporary collection and perform manual merge
- rollback will fail if there is more than 300MB of data or about 30 minutes of operations to rollback
- how to prevent rollbacks
- do not change number of votes of replica set members
- keep secondaries up to date
Connection to Replica Set and Replication Guarantees
- connection strings: mongodb://user_name:password@host1:27017,host2:27018,host3:27019/?replicaSet=replicaName&connectTimeoutMS=10000&authMechanism=SCRAM-SHA-1
> db.runCommand({getLastError: 1, w: "majority"})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority"}})
> db.testColl.insert({a: 1}, {writeConcern: {w: "majority", wtimeout: 5000}})
> db.testColl.insert({a: 1}, {writeConcern: {w: 3, wtimeout: 5000}})
> config = rs.conf()
> config.settings = {}
> config.settings.getLastErrorDefaults = {w: "majority", wtimeout: 5000}
> rs.reconfig(config)
> config = rs.config()
> config.members[0].tags = {"dc": "PL"}
> config.members[1].tags = {"dc": "UK"}
> config.settings = config.settings || {}
> config.settings.getLastErrorModes = [{"allDataCenters" : {"dc": 2}}]
> rs.reconfig(config)
> db.testColl.insert({a: 1}, {writeConcern: {w: "allDataCenters", wtimeout: 5000}})
Read Preference
- read preference describes how MongoDB clients route read operations to the members of a replica set
- read preference modes:
- primary - default mode
- primaryPreferred
- secondary
- secondaryPreferred
- nearest
- tag sets
Administration of Replica Set
Run as Standalone
- many operation can not be done on secondaries
- stop the secondary
- run mongod without --replSet, on different port, with the same --dbpath
$ mongod --port 27999 --dbpath /var/lib/mongodb
Large Replica Sets
- replica sets are limited to 50 members (12 before version 3.0) and only 7 voting members
- this is limited to reduce network traffic generated by heartbeat
- use Master-Slave configuration when more than 50 secondaries is needed
> rs.add({"_id" : 51, "host" : "server-8:27017", "votes" : 0})
Forcing Reconfiguration
- useful when you majority is lost permanantly
- force allows you to send the configuration to secondary (not only to primary)
- configuration needs to be prepared correctly
- force will change version dramatically
> var config = rs.config()
> edit config
> rs.reconfig(config, {"force" : true})
Changing Member Status Manually
- there is no way to force member to become primary
- demote primary to a secondary (by default 60 seconds)
> rs.stepDown()
> rs.stepDown(600)
Preventing Elections
- in can be also used on demoted primary to unfreeze it
> rs.freeze(600)
> rs.freeze(0)
Maintenance Mode
- server will go into recovery state
- useful if server is performing long operations or is far behind in replication
> db.adminCommand({"replSetMaintenanceMode" : true})
> db.adminCommand({"replSetMaintenanceMode" : false})
Monitoring Replication
- status of the replica set (from the current server perspective)
- db.adminCommand("replSetGetStatus") or rs.status()
- important fields: self, stateStr, uptime [s], optimeDate (last oplog operation), pingMs, errmsg
> rs.status()
Replication Source
- rs.status() can be used to create the replication graph
- replication source is determined by ping (smallest)
- it can cause chains creation in replication
- replication loops: s1 from s2, s2 from s3, s3 from s1
> db.adminCommand({"replSetSyncFrom" : server:27017})
> var config = rs.config()
> config.settings = config.settings || {}
> config.settings.chainingAllowed = false
> rs.reconfig(config)
Resizing the Oplog
- long oplog gives you time for maintenance
- oplog on primary should be at least few days or even weeks long
> db.printReplicationInfo()
> db.printSlaveReplicationInfo()
procedure to resize the oplog:
- demote primary into secondary
- shut down and restart as standalone
- copy the last insert from the oplog into temporary collection
> use local
> var cursor = db.oplog.rs.find({"op" : "i"})
> var lastInsert = cursor.sort({"$natural" : -1}).limit(1).next()
> db.tmpLastOp.save(lastInsert)
> db.tmpLastOp.findOne()
- drop current oplog
> db.oplog.rs.drop()
- create new oplog with different size
> db.createCollection("oplog.rs", {"capped" : true, "size" : 10000000})
- move last operation from temporary collection into new oplog
> var lastInsert = db.tmpLastOp.findOne()
> db.oplog.rs.insert(lastInsert)
> db.oplog.rs.findOne()
- shut down standalone server and restart as replica set member
Other operations
- restoring data from delayed secondary
- building indexes
- replication on cheap machines: priority:0, hidden:true, buildIndexes:false, votes:0
- master - slave configuration and mimicking this behaviour when using replica set
- calculating lag: local.slaves, local.me, local.slaves.drop()
Backup and Restore
- backups generally should be done on secondaries
- or on standalone servers at an off time
Method #1: Filesystem Snapshot
- the simplest way to make backup
- filesystem must support snapshotting
- mongod must run with journaling enabled
- stop mongod before restoring
# Create the snapshot volume
lvcreate -L128M -s -n dbbackup /dev/ops/databases
# Mount the snapshot volume
mkdir /mnt/ops/dbbackup
mount /dev/ops/dbbackup /mnt/ops/dbbackup
# Do the backup
tar -cf /dev/rmt0 /mnt/ops/dbbackup
# Remove the snapshot
umount /mnt/ops/dbbackup
lvremove /dev/ops/dbbackup
Method #2: Copying Data Files
- use fsyncLock() before copying files
- fsycnLock() prevents all databases against any further write
- saves all dirty data to disk
- queues all write operations
- now files are consistent and copying of all files is possible
- fsyncUnlock() releases the lock and brings back database to normal operations
- if using authentication do not log out from the console between fsyncLock() and fsyncUnlock()
- stop mongod before restoring
- do not restore single database if crash or hard shutdown occurs
> db.fsyncLock()
> db.fsyncUnlock()
Method #3: Using mongodump and mongorestore
- mongorestore is slower than previous methods and has other downsides
- good way to backup individual databases or collections
- it will create a dump directory (with subdirectories for each database) in current directory
- data is stored in .bson files
- mongodump can be used when mongod is not running
- do not use --dbpath when mongod is running
- when mongodump is running writes are allowed so already backuped data may change before mongodump will finish
- do not use fsyncLock() when using mongodump
- use --oplog if you are running mongod with --replSet
- mongodump will choose secondary when connected to replica set
- use mongodump and mongorestore in the same version
- try to avoid this type of backup if you have unique indexes other than _id
mongodump --help
mongodump --port 27017
mongodump --dbpath /var/lib/mongodb
mongorestore --port 27017 --drop
mongodump --port 27017 --oplog
mongorestore --port 27017 --oplogReplay dump/
mongorestore -d dstDB -c dstCollection dump/srcDB/scrColl.bson
Backup of Replica Set
- all previous methods are OK but 1st and 2nd are recommended (without any modifications)
- when using mongodump use --oplog
- when restoring with mongorestore:
- start a server as a standalone and
- restore the data with --oplogReplay and
- restore the oplog collection
- restart the server as a member of replica
> use local
> db.createCollection("oplog.rs", {"capped" : true, "size" : 1000000})
$ mongorestore -d local -c oplog.rs dump/oplog.bson
Backup of Sharded Cluster
- usually it can not be done perfectly
- instead of backing up everything at once, backup servers separately
- turn off balancer before making backup
- run mongodump through mongos to backup entire cluster
- problems with restoring single shard
Other Administrative Tasks
- Preheating data
- loading everything into memory
- loading selected collections only
> db.runCommand({ touch: "collectionName", data: true, index: true })
- loading specific indexes
- loading recently created documents
- replay application usage
- Compacting
> db.runCommand({"compact" : "collectionName", "paddingFactor" : 1.5})
- Repairing data
- use --repair and optionally --repairpath
> db.repairDatabase()