Sharding in MongoDB
Jump to navigation
Jump to search
Copyright Notice
Copyright © 2004-2023 by NobleProg Limited All rights reserved.
This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
Author
What is Sharding?
- process of splitting data across multiple machines
- manual sharding (any database engine)
- MongoDB supports autosharding
- sharding is the most difficult and complex configuration of MongoDB
- familiarise yourself with standalone servers and replica sets before sharding
When to use it?
- not too early because it adds complexity to the deployment
- not too late because it's difficult to shard an overloaded system without downtime
- to increase available RAM and disk space
- to reduce the load on a single server
- to read or write data with greater throughput than a single server can handle
How to deploy MongoDB Sharded Cluster?
MongoDB Config Servers
- config servers store the metadata for a sharded cluster
- from version 3.4 it has to be deployed as a replica set using WiredTiger storage engine
- the admin database and the config database exist on the config servers
- deployed on separate physical machines, geographically distributed
- it doesn't need many resources (1MB / 200GB)
- it can be deployed on machines with other services
- backups should be done often and especially before any maintenance
$ mongod --configsvr --replSet sh_config --dbpath c:\data\sh_config_1 --port 27019
MongoDB Router (the mongos process)
- mongos is the process for client applications to connect to
- it replaces mongod used for standalone server or primary in replica sets
- the number of instances is not limited
- common setup is one mongos per application server (on the same machine)
- must know where the config servers are
- it does not need data directory
- set up clients to send all requests to the mongos!
$ mongos --configdb sh_config/localhost:27019 --port 27017
Adding a new shard into the cluster
- a shard should be a replica set
- replica set name will be used as a name of a shard
$ mongod --shardsvr --replSet shard_a --dbpath c:\data\shard_a_1 --port 27018
$ mongo localhost:27017
> sh.addShard("shard_a/localhost:27018")
> sh.status()
Test deployment of MongoDB Sharded Cluster
- commands below will start sharded cluster with
- 3 shards on ports 30000, 30001, 30002 (MongoDB 3.2: 20000, 20001, 20002)
- one mongos process on port 30999 (MongoDB 3.2: 20006)
$ mkdir c:\data\db\test0
$ mongo --nodb
> cluster = new ShardingTest({shards : 3, config : 3, other : { rs : true }})
How Sharding Works?
- databases and collections are not sharded by default
- you must enable sharding for each database and each collection separately
- shard key, chunks and chunks ranges, splitting chunks, split threshold
- $minKey <= chunk1 < split point 1 <= chunk2 < $maxKey
- balancer and moving chunks
Range Based Sharding
> sh.status()
> sh.enableSharding("test")
> use test
> db.visitors.createIndex({"visitor" : 1})
> sh.shardCollection("test.visitors", {"visitor" : 1})
> db.visitors.explain(1).find({visitor:"visitor1234"})
> db.visitors.explain(1).find({i:1234})
Hash Based Sharding
Zone Based Sharding
Sharded Cluster Administration
Balancer
> sh.getBalancerState()
> sh.stopBalancer()
> sh.isBalancerRunning()
> sh.setBalancerState(true)
Operations on chunks
use config
db.settings.save({ _id:"chunksize", value:1 })
sh.splitFind("test.visitors", {visitor:"visitor_10"})
sh.splitAt("test.visitors", {visitor:"visitor_10"})
use admin
db.runCommand({mergeChunks:"test.visitors", bounds:[{visitor:"visitor_10"}, {visitor:"visitor_20"}]})
db.adminCommand({moveChunk:"test.visitors", find:{visitor:"visitor_99999"}, to:"shard_a"})
Refreshing mongos cache
> use admin
> db.runCommand({flushRouterConfig : 1})