Choosing which database to use is one of the most important decisions you can make when working on a new service or application.
More than 300 database management systems are available in the market, and choosing one can be overwhelming for developers. You have a variety of options available in SQL, NoSQL, NewSQL, Online Transactional Processing (OLTP), Online Analytical Processing (OLAP), and Hybrid Transaction/Analytical Processing (HTAP) database. Still, you must understand that none of them fits all kinds of project requirements. Moreover, each one of them has some strengths and weaknesses.
In this blog post, we will talk about SQL and NoSQL only. This is the most frequent choice of the developers.
Find out the main database selection criteria and decide which database to choose!
How to choose a database? First of all, you need to answer several questions in database selection:
1. How much data do you expect to store in the future?
2. How many users do you expect to handle simultaneously at peak load?
3. What kind of actions will be more: save or read from/to database?
4.What availability, scalability, latency, throughput, and data consistency does your application need?
5. Do you have budget? If yes, how much money can you spend on the database?
Now let’s talk about some key aspects that will answer the above questions and help you choose the proper database for your application.
In relational databases, data is stored in different tables containing multiple records (rows). These tables are connected with one or more relations. Structured Query language (SQL) is the standard language for dealing with Relational Databases.
In general, SQL databases can scale vertically, meaning you can increase the load on a server by migrating to a larger server that adds more CPU, RAM, or SSD capability. While vertical scalability is used most frequently, SQL databases can also scale horizontally through sharding or partitioning logic, although that’s not well-supported.
SQL database schema organizes data in relational, tabular ways, using tables with columns or attributes and rows of records. Because SQL works with such a strictly predefined schema, it requires organizing and structuring data before starting with the SQL database.
RDBMS, which uses SQL, must exhibit four properties, known by the acronym ACID. These ensure that transactions are processed successfully and that the SQL database has a high level of reliability:
- Atomicity: All transactions must succeed or fail completely and cannot be left partially complete, even in the case of system failure.
- Consistency: The database must follow rules that validate and prevent corruption at every step.
- Isolation: Concurrent transactions cannot affect each other.
- Durability: Transactions are final, and even system failure cannot “roll back” a complete transaction.
Because SQL databases have a long history now, they have huge communities, and many examples of their stable codebases online. There are many experts available to support SQL and programming relational data.
Relational databases: PostgreSQL, MySQL, Oracle, Ms SQL Server, Sybase, etc.
PostgreSQL has many advantages over the other relational databases, in particlular:
- firstly, it's an open source and totally free database. For example, Oracle is the most expensive and massive database with a complex documentation
- supports advanced data types such as arrays, hstore, and user-defined types. Furthermore, it supports json type called jsonb with complex searching
- all popular sql databases support by well-known cloud platforms like AWS, GCP, and Azure. In Axon, we prefer AWS cloud platform for our projects. So if we need high performance and availability cloud solution - this is Amazon Aurora that supports MySql and PostgreSQL
SQL is a good choice when dealing with related data. Relational databases are powerful, flexible, and easily accessible to any application. The advantage of a relational database is that when one user updates a particular record, each database instance is automatically updated, and this information is provided in real-time.
SQL and a relational database make it easy to process large volumes of information, scale as needed and provide flexible access to data—for example, only a one-time update of data is required instead of changing multiple files. This database selection is also best suited for assessing data integrity. Since every piece of information is stored in one place, there is no problem that previous versions distort the picture.
NoSQL is a non-relational DMS that does not require a fixed schema, avoids joins, and is easy to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, and Google, collect terabytes of user data daily.
NoSQL databases scale better horizontally, which means one can add additional servers or nodes as needed to increase the load.
NoSQL databases are not relational, so they don’t solely store data in rows and tables. Instead, they generally fall into one of four types of structures:
- Column-oriented, where data is stored in cells grouped in a virtually unlimited number of columns rather than rows.
- Key-value stores, which use an associative array (also known as a dictionary or map) as their data model. This model represents data as a collection of key-value pairs.
- Graph databases, which represent data on a graph showing how different sets of data relate to each other. Neo4j, RedisGraph (a graph module built into Redis) and OrientDB are examples of graph databases.
While SQL calls for ACID properties, NoSQL follows the CAP theory (although some NoSQL databases — such as IBM’s DB2, MongoDB, AWS’s DynamoDB, and Apache’s CouchDB — can also integrate and follow ACID rules).
The CAP theorem says that distributed data systems allow a trade-off that can guarantee only two of the following three properties (which form the acronym CAP) at any one time:
- Consistency: Every request receives either the most recent result or an error. MongoDB is an example of a strongly consistent system, whereas others such as Cassandra offer eventual consistency.
- Availability: Every request has a non-error result.
- Partition tolerance: Any delays or losses between nodes do not interrupt the system operation.
While NoSQL has quickly been adopted, it has smaller user communities and, therefore, less support. However, NoSQL users benefit from open-source systems, as opposed to the many proprietary SQL languages.
While SQL is valued for providing data validity, NoSQL is good when the fast availability of big data is more critical. It is also a good choice when a company needs to scale due to changing requirements. NoSQL is easy to use, flexible, and offers high performance.
NoSQL is also a good choice when you have large (or constantly changing) datasets or when dealing with flexible data models or needs that don't fit into the relational model. Document databases (such as CouchDB, MongoDB, and Amazon DocumentDB) are well-suited when working with large amounts of unstructured data. For quick access to a key and value store without strong integrity guarantees, Redis might be the best choice. Elastic Search is a good choice when complex or flexible searches are required across large amounts of data.
Scalability is an essential advantage of NoSQL databases. Unlike SQL, their built-in sharding and high availability requirements allow for horizontal scaling. In addition, NoSQL databases such as Facebook's Cassandra handle vast amounts of data distributed across multiple servers, have no single points of failure, and provide maximum availability.
Database selection is the primary decision to make when starting work with a new application. In the beginning, we have determined the main questions to ask before choosing a database. We hope that in the end you can already answer the main question - “What database should I use?”.
Pay attention to the database selection criteria mentioned here and address the technical specialists to make sure that the choice is right!