What is a Distributed Database?
Distributed architecture has long since become mainstream in the world of software development. For a long time, the database lagged behind. But now, distributed databases are mainstream, too. So what is a distributed database, and when should you use one?
What is a Distributed Database?
A distributed database is a database that consists of two or more files located on different sites either on the same network or on entirely different networks. Portions of the database are stored in multiple physical locations and processing is distributed among multiple database nodes.
A centralized distributed database management system (DDBMS) integrates data logically so it can be managed as if it were all stored in the same location. The DDBMS synchronizes all the data periodically and ensures that data updates and deletes performed at one location will be automatically reflected in the data stored elsewhere.
By contrast, a centralized database consists of a single database file located at one site using a single network.
How does it work?
A distributed system is a group of interconnected computers — making it appear like a single system. Typically, in a DDBMS, several “sites” are managed by the system, which appears as a single logical database stored at one site. Distributed databases provide location transparency, which means that applications do not need to know the exact site location where the data is stored. When a query is run on a distributed database, a collective set of sites across different data centers work together to answer the question.
Why use Distributed Databases?
They provide data location clarity while retaining local control. This implies that, even if apps don’t know what the data is, each site may govern data locally, manage security, log transactions, and recover when local website problems occur. Even if connectivity to other sites breaks, autonomy is still available. This offers greater flexibility in situations where specialized data kept in specific locations may require additional security and compliance restrictions than other data.
For example, customer data maintained for retail clients in the EU area must comply with GDPR rules.
Advantages and Disadvantages
Advantages
- Modular Development. Modular development of a distributed database implies that a system can be expanded to new locations or units by adding new servers and data to the existing setup and connecting them to the distributed system without interruption. This type of expansion causes no interruptions in the functioning of distributed databases.
- Reliability. They offer greater reliability in contrast to centralized databases. In case of a database failure in a centralized database, the system comes to a complete stop. In a distributed database, the system functions even when failures occur, only delivering reduced performance until the issue is resolved.
- Lower Communication Cost. Locally storing data reduces communication costs for data manipulation in distributed databases. Local data storage is not possible in centralized databases.
- Better Response. Efficient data distribution provides a faster response when user requests are met locally. In centralized databases, user requests pass through the central machine, which processes all requests. The result is an increase in response time, especially with a lot of queries.
Disadvantages
- Costly Software. Ensuring data transparency and coordination across multiple sites often requires using expensive software in a distributed database system.
- Large Overhead. Many operations on multiple sites require numerous calculations and constant synchronization when database replication is used, causing a lot of processing overhead.
- Data Integrity. A possible issue when using database replication is data integrity, which is compromised by updating data at multiple sites.
- Improper Data Distribution. Responsiveness to user requests largely depends on proper data distribution. That means responsiveness can be reduced if data is not correctly distributed across multiple sites.
Types of Distributed Databases
There are two types of distributed database systems: homogeneous and heterogeneous. A homogenous system is made up of identical databases over different sites. A homogenous system is relatively easy to manage as the sites work under the same distributed database management system, data structure, and operating system. Heterogenous systems work over contrasting OS, DDBMS, and schema, meaning one site is not aware of changes happening at other connected sites.
Data is stored on distributed databases either by fragmentation or data replication. Data fragmentation is when data is broken up into small chunks and then stored over different sites. Data replication is when all connected sites have copies of the same data, not just a part of it. An update to any part of the data means all connected sites are also updated, which can enhance parallel query requests.
Conclusion
Like any other technology, distributed databases have their advantages and drawbacks. However, for modern use cases, their advantages outweigh the drawbacks. There are several types of distributed database architecture, and you should only choose the one that best fits your needs after careful consideration.