Database Resilience

Introduction

The MongoDb database has comprehensive support for resilience. It is straightforward to configure replication however there are some important considerations when considering the architecture and physical locations of your database servers.

Contents



Configuration Options

For the purposes of this document we are making the assumption that only 3 database servers will be used for resilience. More can be used, however there must always be an odd number of servers to satisfy the voting mechanism which is used to decide which server gets promoted in the event that the primary server fails.

Primary, Secondary and Arbiter (PSA)

In this configuration there are only 2 database servers which synchronise data, the primary and secondary. The arbiter is there to ensure the secondary can be promoted in the event the primary fails.

Pros:

  • More cost effective setup because an arbiter does not require much compute or storage capacity so can be hosted with other applications.

Cons:

  • Less resilient in that there are only two copies of the data

  • Less checking to ensure the secondary has received updates from the primary

  • Requires the same effort to configure as a system with 3 data servers

Due to the reduced resilience and only a minimal saving in storage costs, this architecture is no longer recommended

Primary and Two Secondary Servers

In this configuration all 3 servers synchronise and maintain their own data storage.

Pros:

  • More resilient because there are 3 copies of the data.

  • Better verification that data has been replicated at least once.

  • No more complex than the PSA architecture.

Cons:

  • Requires 3 properly configured data servers with sufficient capacity for the expected load

We recommend this architecture for all resilient setups when deploying ClearWay™ and AdvanceGuard®.

System Resilience

System resilience provides continuation of the database services in the event that one of the servers has a hardware or software failure. If the primary server platforms fails in any way then one of the secondary servers will automatically take over.

image-20240325-180837.png
Shared Data Centre - System Resilience

However, because all these servers are co-located, there is no site resilience. In other words if this data centre became inoperable then all data service would be lost.

The network interconnections between the servers are not covered within this documentation. It is assumed that for maximum resilience the network infrastructure will have its own resilient setup with redundant links and hardware.

Site Resilience

Site resilience offers protection in the event of a whole data centre failing. This might occur as a result of a fire, flood or major power loss. In order to achieve this with a MongoDb replica set you need to have all 3 servers in separate physical locations.

The reason is that you must have 2 online at one time to ensure you have a fully functioning system. If you only use a primary and secondary location then one of these locations has to host two servers which means you still have a single point of failure. With two data centres you will always retain a copy of the data but in event you lose the majority of your servers the whole replica set will become read only. To avoid this you should consider 3 separate locations.

 

image-20240327-191752.png
Separate Data Centres - Site Resilience

For further information on site resilience please refer to the MongoDb documentation https://www.mongodb.com/docs/manual/core/replica-set-architecture-geographically-distributed/#:~:text=If%20possible%2C%20distribute%20members%20across,the%20number%20of%20members)%20centers..


Safety is everything.