Sharding in MongoDB

From Training Material
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Author


What is Sharding?

  • process of splitting data across multiple machines
  • manual sharding (any database engine)
  • MongoDB supports autosharding
  • sharding is the most difficult and complex configuration of MongoDB
  • familiarise yourself with standalone servers and replica sets before sharding


When to use it?

  • not too early because it adds complexity to the deployment
  • not too late because it's difficult to shard an overloaded system without downtime
  • to increase available RAM and disk space
  • to reduce the load on a single server
  • to read or write data with greater throughput than a single server can handle


How to deploy MongoDB Sharded Cluster?

MongoDB Config Servers

  • config servers store the metadata for a sharded cluster
  • from version 3.4 it has to be deployed as a replica set using WiredTiger storage engine
  • the admin database and the config database exist on the config servers
  • deployed on separate physical machines, geographically distributed
  • it doesn't need many resources (1MB / 200GB)
  • it can be deployed on machines with other services
  • backups should be done often and especially before any maintenance
$ mongod --configsvr --replSet sh_config --dbpath c:\data\sh_config_1 --port 27019


MongoDB Router (the mongos process)

  • mongos is the process for client applications to connect to
  • it replaces mongod used for standalone server or primary in replica sets
  • the number of instances is not limited
  • common setup is one mongos per application server (on the same machine)
  • must know where the config servers are
  • it does not need data directory
  • set up clients to send all requests to the mongos!
$ mongos --configdb sh_config/localhost:27019 --port 27017


Adding a new shard into the cluster

  • a shard should be a replica set
  • replica set name will be used as a name of a shard
$ mongod --shardsvr --replSet shard_a --dbpath c:\data\shard_a_1 --port 27018
$ mongo localhost:27017
> sh.addShard("shard_a/localhost:27018")
> sh.status()


Test deployment of MongoDB Sharded Cluster

  • commands below will start sharded cluster with
    • 3 shards on ports 30000, 30001, 30002 (MongoDB 3.2: 20000, 20001, 20002)
    • one mongos process on port 30999 (MongoDB 3.2: 20006)


$ mkdir c:\data\db\test0
$ mongo --nodb
> cluster = new ShardingTest({shards : 3, config : 3, other : { rs : true }})


How Sharding Works?

  • databases and collections are not sharded by default
  • you must enable sharding for each database and each collection separately
  • shard key, chunks and chunks ranges, splitting chunks, split threshold
  • $minKey <= chunk1 < split point 1 <= chunk2 < $maxKey
  • balancer and moving chunks


Range Based Sharding

> sh.status()
> sh.enableSharding("test")
> use test
> db.visitors.createIndex({"visitor" : 1})
> sh.shardCollection("test.visitors", {"visitor" : 1})
> db.visitors.explain(1).find({visitor:"visitor1234"})
> db.visitors.explain(1).find({i:1234})

Hash Based Sharding

Zone Based Sharding

Sharded Cluster Administration

Balancer

> sh.getBalancerState()
> sh.stopBalancer()
> sh.isBalancerRunning()
> sh.setBalancerState(true)


Operations on chunks

use config
db.settings.save({ _id:"chunksize", value:1 })
sh.splitFind("test.visitors", {visitor:"visitor_10"})
sh.splitAt("test.visitors", {visitor:"visitor_10"})
use admin
db.runCommand({mergeChunks:"test.visitors", bounds:[{visitor:"visitor_10"}, {visitor:"visitor_20"}]})
db.adminCommand({moveChunk:"test.visitors", find:{visitor:"visitor_99999"}, to:"shard_a"})


Refreshing mongos cache

> use admin
> db.runCommand({flushRouterConfig : 1})