NoSQL is a type of database, the specificity of which is to be non-relational. These systems allow the storage and analysis of Big Data. Discover everything you need to know: definition, history, operation, use cases, advantages, training …
In the age of Big Data, relational databases are no longer suitable . To support, store and analyze the immense volumes of data, it is imperative to rely on new solutions.
A database NoSQL is a database “non-relational” . It is possible to store data there in an unstructured form, without following a fixed pattern. Joins are no longer necessary, and scaling is made easier.
Especially used databases NoSQL for Data Stores distributed to high storage capacity needs. Thus, NoSQL is used for big data and real-time web applications. Tech giants like Twitter, Facebook and Google collect several terabytes of data on their users every day.
The term “NoSQL” actually stands for “Not Only SQL” (not just SQL). This is because relational databases use SQL syntax for data storage and analysis.
This is not the case with a non-relational database. NoSQL systems are compatible with a wide variety of technologies allowing the storage of structured, unstructured, semi-structured or polymorphic data.
The history of NoSQL
The term and concept NoSQL was coined in 1998 by Carl Strozz to refer to his lightweight, open-source relational database. This concept was then adopted and popularized by GAFAMs such as Google, Facebook or Amazon faced with immense volumes of data. Relational databases had grown too slow.
Rather than updating their IT equipment to increase RDBMS (Relational Database Management System) performance, the tech giants have chosen to distribute the load across multiple host servers. This is the so-called “scaling out” method. NoSQL databases are ideal for scaling-out, since they are non-relational.
In 2000, the Neo4j graphics database was launched. It was then the turn of Google Bigtable, in 2004, then CouchDB in 2005. The history of NoSQL databases was also marked by Amazon Dynamo in 2007.
Then, in 2008, Facebook made open source the non-relational database it uses internally: Cassandra . This tool becomes the benchmark for NoSQL databases, and puts the term NoSQL back in the spotlight by giving it its meaning and current popularity.
The characteristics of NoSQL
The main peculiarity of NoSQL databases is that they do not follow the relational model and do not present tables in the form of fixed columns. These databases do not require data normalization or relational mapping. It is possible to interact without using complex query languages.
Another peculiarity is the absence or flexibility of the diagrams . There is no need to define a data schema, so data from different structures can be grouped together on a single system.
Non-relational databases are also distinguished by an easy-to-use interface for storing and querying data. APIs make it possible to manipulate the data with various selection methods. The protocols, which are text-based, rely primarily on HTTP REST with JSON. Usually a NoSQL query language is used.
The last characteristic of a NoSQL database is that it is distributed . Multiple NoSQL databases can be run in a distributed fashion, offering auto-scaling and fail-over capabilities. The ACID concept can be abandoned in favor of elasticity and performance.
The different types of NoSQL databases
There are four main types of NoSQL databases : key / value pair, column-oriented, graph-oriented, and document-oriented. Each of these categories has a unique attribute and specific limits.
However, none of these four types of databases can solve any problem . It is necessary to choose the appropriate database according to the use case.
In the case of key / value pair databases, data is stored as key / value pairs. This allows the support of large volumes of data and heavy loads. The data is stored in a “hash” table in which each key is unique. The value can be a JSON, a BLOB, a line of code, or whatever.
This type of database is the most basic. It makes it easier for the developer to store data without schema. As examples, we can cite Redis or Dynamo. Moreover, Amazon Dynamo is the initial model of this category of database.
The database oriented columns , as the name suggests, is based on columns. They are based on Google’s BigTable model. Each column is processed separately, and the values are stored contiguously.
This database category provides high performance for aggregate queries like SUM, COUNT, AVG, and MIN. For good reason, the data is already available and ready in a column. By way of examples, there may be mentioned HBase, Cassandra or Hypertable.
The Graph-Based databases store entities and relationships between these entities. The entity is stored as a node, and the relationships as borders. It is thus easy to visualize the relations between the nodes. Each node and each edge has a unique identifier.
This type of database is multi-relational . It is mainly used for social networks, logistics or spatial data. Some of the more popular examples include Neo4J, Infinite Graph, OrientDB, and FlockDB.
Document-oriented databases also store and retrieve data as a key-value pair. However, the value is stored as a document in JSON or XML format . The value is thus understood by the database and can be found using a query.
This type of database therefore offers increased flexibility . It is mainly used for CMS systems, blogging platforms, or e-commerce applications. However, it is not suitable for complex transactions requiring multiple operations or queries on variable aggregate structures. The best-known examples in this category are Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, and MongoDB.
Advantages and Disadvantages of NoSQL
NoSQL has many advantages, but also disadvantages . These databases are ideal for big data storage and analysis, and also avoid a single point of failure.
They facilitate replication, and do not require a separate caching layer. The performance is high, and horizontal scalability is possible . NoSQL databases can support structured or unstructured data in the same way.
In addition, object-oriented programming is easy to use and flexible. NoSQL databases also do not require a high performance dedicated server. They are compatible with the main programming languages. The implementation is simpler than with RDBMS. The flexible scheme can be altered easily without interruption.
However, this type of database also has weaknesses . We can cite the absence of standardization rules and the limited request capacities. Traditional database capabilities, such as consistency when multiple transactions are performed simultaneously, may also be lacking.
Also, it becomes difficult to maintain unique values as keys as the volume of data increases. This model does not work as well for relational data. The learning curve can be difficult for new developers, and open source options are not always popular with businesses. In general, relational databases and their tools are more mature, more successful and therefore more adopted.
Why use NoSQL?
NoSQL databases are suitable for several use cases. They are suitable for storing and retrieving large volumes of data . They are also suitable when the relationships between the data are not particularly important.
It can also be used if the data changes over time and is unstructured. Finally, they are suitable when the volume of data is continuously increasing and regular scaling of the database is necessary to support them.