Sharding in MongoDB

From Training Material
Revision as of 22:37, 16 April 2018 by Kbaran (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Author


What is Sharding?

  • process of splitting data across multiple machines
  • manual sharding (any database engine)
  • MongoDB supports autosharding
  • sharding is the most difficult and complex configuration of MongoDB
  • familiarise yourself with standalone servers and replica sets before sharding


When to use it?

  • not too early because it adds complexity to the deployment
  • not too late because it's difficult to shard an overloaded system without downtime
  • to increase available RAM and disk space
  • to reduce the load on a single server
  • to read or write data with greater throughput than a single server can handle


How to deploy MongoDB Sharded Cluster?

MongoDB Config Servers

  • config servers store the metadata for a sharded cluster
  • from version 3.4 it has to be deployed as a replica set using WiredTiger storage engine
  • the admin database and the config database exist on the config servers
  • deployed on separate physical machines, geographically distributed
  • it doesn't need many resources (1MB / 200GB)
  • it can be deployed on machines with other services
  • backups should be done often and especially before any maintenance
$ mongod --configsvr --replSet sh_config --dbpath c:\data\sh_config_1 --port 27019


MongoDB Router (the mongos process)

  • mongos is the process for client applications to connect to
  • it replaces mongod used for standalone server or primary in replica sets
  • the number of instances is not limited
  • common setup is one mongos per application server (on the same machine)
  • must know where the config servers are
  • it does not need data directory
  • set up clients to send all requests to the mongos!
$ mongos --configdb sh_config/localhost:27019 --port 27017


Adding a new shard into the cluster

  • a shard should be a replica set
  • replica set name will be used as a name of a shard
$ mongod --shardsvr --replSet shard_a --dbpath c:\data\shard_a_1 --port 27018
$ mongo localhost:27017
> sh.addShard("shard_a/localhost:27018")
> sh.status()


Test deployment of MongoDB Sharded Cluster

  • commands below will start sharded cluster with
    • 3 shards on ports 30000, 30001, 30002 (MongoDB 3.2: 20000, 20001, 20002)
    • one mongos process on port 30999 (MongoDB 3.2: 20006)


$ mkdir c:\data\db\test0
$ mongo --nodb
> cluster = new ShardingTest({shards : 3, config : 3, other : { rs : true }})


How Sharding Works?

  • databases and collections are not sharded by default
  • you must enable sharding for each database and each collection separately
  • shard key, chunks and chunks ranges, splitting chunks, split threshold
  • $minKey <= chunk1 < split point 1 <= chunk2 < $maxKey
  • balancer and moving chunks


Range Based Sharding

> sh.status()
> sh.enableSharding("test")
> use test
> db.visitors.createIndex({"visitor" : 1})
> sh.shardCollection("test.visitors", {"visitor" : 1})
> db.visitors.explain(1).find({visitor:"visitor1234"})
> db.visitors.explain(1).find({i:1234})

Hash Based Sharding

Zone Based Sharding

Sharded Cluster Administration

Balancer

> sh.getBalancerState()
> sh.stopBalancer()
> sh.isBalancerRunning()
> sh.setBalancerState(true)


Operations on chunks

use config
db.settings.save({ _id:"chunksize", value:1 })
sh.splitFind("test.visitors", {visitor:"visitor_10"})
sh.splitAt("test.visitors", {visitor:"visitor_10"})
use admin
db.runCommand({mergeChunks:"test.visitors", bounds:[{visitor:"visitor_10"}, {visitor:"visitor_20"}]})
db.adminCommand({moveChunk:"test.visitors", find:{visitor:"visitor_99999"}, to:"shard_a"})


Refreshing mongos cache

> use admin
> db.runCommand({flushRouterConfig : 1})