Sharding in MongoDB

Copyright Notice

This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.

Author

Kamil Baran

What is Sharding?

process of splitting data across multiple machines
manual sharding (any database engine)
MongoDB supports autosharding
sharding is the most difficult and complex configuration of MongoDB
familiarise yourself with standalone servers and replica sets before sharding

When to use it?

not too early because it adds complexity to the deployment
not too late because it's difficult to shard an overloaded system without downtime
to increase available RAM and disk space
to reduce the load on a single server
to read or write data with greater throughput than a single server can handle

How to deploy MongoDB Sharded Cluster?

MongoDB Config Servers

config servers store the metadata for a sharded cluster
from version 3.4 it has to be deployed as a replica set using WiredTiger storage engine
the admin database and the config database exist on the config servers
deployed on separate physical machines, geographically distributed
it doesn't need many resources (1MB / 200GB)
it can be deployed on machines with other services
backups should be done often and especially before any maintenance

$ mongod --configsvr --replSet sh_config --dbpath c:\data\sh_config_1 --port 27019

MongoDB Router (the mongos process)

mongos is the process for client applications to connect to
it replaces mongod used for standalone server or primary in replica sets
the number of instances is not limited
common setup is one mongos per application server (on the same machine)
must know where the config servers are
it does not need data directory
set up clients to send all requests to the mongos!

$ mongos --configdb sh_config/localhost:27019 --port 27017

Adding a new shard into the cluster

a shard should be a replica set
replica set name will be used as a name of a shard

$ mongod --shardsvr --replSet shard_a --dbpath c:\data\shard_a_1 --port 27018
$ mongo localhost:27017
> sh.addShard("shard_a/localhost:27018")
> sh.status()

Test deployment of MongoDB Sharded Cluster

commands below will start sharded cluster with
- 3 shards on ports 30000, 30001, 30002 (MongoDB 3.2: 20000, 20001, 20002)
- one mongos process on port 30999 (MongoDB 3.2: 20006)

$ mkdir c:\data\db\test0
$ mongo --nodb
> cluster = new ShardingTest({shards : 3, config : 3, other : { rs : true }})

How Sharding Works?

databases and collections are not sharded by default
you must enable sharding for each database and each collection separately
shard key, chunks and chunks ranges, splitting chunks, split threshold
$minKey <= chunk1 < split point 1 <= chunk2 < $maxKey
balancer and moving chunks

Range Based Sharding

> sh.status()
> sh.enableSharding("test")
> use test
> db.visitors.createIndex({"visitor" : 1})
> sh.shardCollection("test.visitors", {"visitor" : 1})

> db.visitors.explain(1).find({visitor:"visitor1234"})
> db.visitors.explain(1).find({i:1234})

Hash Based Sharding

Zone Based Sharding

Sharded Cluster Administration

Balancer

> sh.getBalancerState()
> sh.stopBalancer()
> sh.isBalancerRunning()
> sh.setBalancerState(true)

Operations on chunks

use config
db.settings.save({ _id:"chunksize", value:1 })

sh.splitFind("test.visitors", {visitor:"visitor_10"})
sh.splitAt("test.visitors", {visitor:"visitor_10"})
use admin
db.runCommand({mergeChunks:"test.visitors", bounds:[{visitor:"visitor_10"}, {visitor:"visitor_20"}]})
db.adminCommand({moveChunk:"test.visitors", find:{visitor:"visitor_99999"}, to:"shard_a"})

Refreshing mongos cache

> use admin
> db.runCommand({flushRouterConfig : 1})

Sharding in MongoDB

Contents

Copyright Notice

Author

What is Sharding?

When to use it?

How to deploy MongoDB Sharded Cluster?

MongoDB Config Servers

MongoDB Router (the mongos process)

Adding a new shard into the cluster

Test deployment of MongoDB Sharded Cluster

How Sharding Works?

Range Based Sharding

Hash Based Sharding

Zone Based Sharding

Sharded Cluster Administration

Balancer

Operations on chunks

Refreshing mongos cache

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools