Sharding refers to the concept of splitting your database (both relational and non-relational) horizontally, and distributing your data across multiple servers a.k.a. shards, where each shard contains a subset of the data.
This allows for horizontal scaling as the data continues to grow, allowing you to handle more queries by spreading the load across several machines.#
Advantages of Sharding
- Sharding reduces dataset size which improves performance due to smaller index size.
- Reduced datasets also ensure higher Cache Locality.
- Allows for a higher throughput of writes due to shards allowing independent writes instead of having to go through a central master database.
Disadvantages of Sharding
- Application logic will have to be updated to work with shards and lead to complex SQL queries.
- Distribution of data across shards might be lopsided. Some shards may be under increased load compared to the others.
- To fix this, rebalancing of data across shards is needed. This may require moving a potentially large amount of data which can be resource intensive.
- Ensuring a system remains operational during rebalancing adds significant complexity.
Additional Topics
- Consistent Hashing using ring-based structures in sharding.