Comparison Between Non Relational Databases

SAURAV SANAP
7 min readJun 28, 2022

--

I. Amazon SimpleDB

Amazon SimpleDB is a highly available, scalable, and flexible non-relational data store that enables to store and query data items using web services requests. It requires no schema, automatically indexing and providing a simple API for storage and access. Data is accessed by http through REST and SOAP protocols.

SimpleDB consists of multiple domains and each domain stores a set of records. Each record has a unique key and a set of attributes, which may or may not be present in every record. Within each domain, the records are automatically indexed by every attribute. The main operations are to read/write a record by key or to search for a set of records by one or more attributes.

A. SimpleDB Data Model:

Data is stored in domains which are only defined by their name and domain size cannot exceed 10 Giga. Domains contain Items. An Item is made of an Item Name (unique ID in the domain) and several Attributes. An Attribute is made of an attribute name and one or several values. Attribute values can only have String type. Items in one domain can have different attribute name.

Figure 1: Amazon Simple data model

SimpleDB lets the client organize the data into domains, which can be compared with tables in relation databases, with the difference that a domain can contain a different set of attributes for each item. All attributes are byte arrays with a maximum size of 1024 bytes. Each item can have multiple values for each attribute. Due to restrictions in the Query Model it is impossible to model relations of objects with SimpleDB without creating redundant information.

Queries can only be processed against one domain, so if a client needs to aggregate data from different domains, this must also be done in the application layer. But it would be unwise to put everything into one domain, because domains are used in SimpleDB to partition the data, which results in restrictions in the size of one domain. Furthermore, the query performance is dependent on the size of such a domain.

B. SimpleDB Architecture:

SimpleDB is based on Amazon’s S3 — Simple Storage Service in which users are granted unlimited data storage capacity at very inexpensive rates. Data in the S3 system is stored across a number of servers or storage devices in the Amazon scalable distributed network. SimpleDB and S3 are extensions of the Cloud Computing Architecture in which computing resources, software applications, and data are shared across the web on a demand basis.

Figure 2: Amazon SimpleDB Architecture and its Components

Software resources such as Applications and Data are stored on distributed servers so when a user demands use of a Word Processing Application, that instance is provided to the user via the web browser. So, the user need not keep or store applications or data on their computing devices but will depend on the reliability and availability of the internet for access.

II. Apache CouchDB

CouchDB is a document-oriented database server, accessible through REST APIs. Couch is an acronym for “Cluster Of Unreliable Commodity Hardware”, emphasizing the distributed nature of the database. CouchDB is designed for document-oriented applications, such as forums, bug tracking, wiki, email, etc. The CouchDB project is part of the Apache Foundation and is completely written in Erlang. CouchDB is ad-hoc and schema-free with a flat address space.CouchDB is not only a NoSQL database, but also a web server for applications written in JavaScript.

A. CouchDB Data Model:

Data in CouchDB is organized into documents. Each document can have any number of attributes and each attribute itself can contain lists or even objects. The Documents are stored and accessed as JSON objects, this is why CouchDB supports the data types String, Number, Boolean and Each CouchDB document has a unique identifier and because CouchDB uses optimistic replication on the server side and on the client side, each document has also a revision identifier. The revision id is updated by CouchDB every time a document is rewritten.

Update operations in CouchDB are performed on whole documents. If a client wants to modify a value in a document, it has first to load the document, make the modifications on it and then the client has to send the whole document back to the database.

B. CouchDB Architecture:

There are three main components of CouchDB i.e. Storage Engine, View Engine and Replicator.

Figure 3: Simple Architecture of Apache CouchDB database

Storage Engine: It is B-tree based and the core of the system which manages storing internal data, documents and views. Data in CouchDB is accessed by keys or key ranges which map directly to the underlying B-tree operations. This direct mapping improves speed significantly.

View Engine: It is based on Mozilla SpiderMonkey and written in JavaScript. It allows creating adhoc views that are made of MapReduce jobs. Definitions of the views are stored in design documents. When a user reads data in a view, CouchDB makes sure the result is up to date. Views can be used to create indices and extract data from documents

Replicator : It is responsible for replicating data to a local or remote database and synchronizing design documents

III. Google’s Big Table

BigTable is a distributed storage system for managing structured data. Bigtable is designed to reliably scale to petabytes of data and thousands of machines. It has several goals like wide applicability, scalability, high performance, and high availability. It is built on Google File System “GFS” and Google offers access to it as part of Google App Engine which is built on top of DataStore and DataStore is built on top of Bigtable. Bigtable is used by more than sixty Google products and projects including Google Analytics, Google Finance, Orkut, Personalized Search and Google Earth.

➢BigTable is a distributed hash mechanism built on top of GFS.

➢It doesn’t support joins or SQL type queries.

➢Machines can be added and deleted while the system is running and the whole system just works.

A. BigTable Data Model:

Instances of Bigtable are run on clusters and each instance can have multiple tables. A bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key and a timestamp, each value in the map is an uninterpreted array of bytes.

The data in the tables is organized into three dimensions: Rows, Columns and Timestamps

(row:string, column:string, time:int64) → string

Rows: Bigtable maintains data in alphabetical order by row key. The row keys in a table are arbitrary strings. Rows are the unit of transactional consistency.

Columns: Column keys are grouped into column families. Data stored in a column family is usually of the same type. A column family must be created before data can be stored under any column key in that family.

Timestamps: Each cell can hold multiple versions of the same data, these versions are indexed by timestamp (64-bit integers). Timestamps can be set by Bigtable or client applications BigTable Architecture:

Bigtable is built on top of Google File System. The underlying file format is SSTable. SSTables are designed so that a data access requires, at most, a single disk access.

Each data item is stored in a cell which can be accessed using a row key, column key, or timestamp. Each row is stored in one or more tablets and a tablet is a sequence of 64KB blocks in a data format called SSTable.

Figure 4: Google BigTable Architecture

BigTable has three different types of servers namely Master servers, Tablet servers and Lock servers

Master servers — The Master servers assign tablets to tablet servers, balances the tablet server load, detects the loss or addition of tablet servers, performs garbage collection and some other chores. Importantly, client data doesn’t move through the master. In fact, the system is architected in such a way that most clients never communicate with the master, which helps keeps the master lightly loaded in practice

Tablet servers — Each tablet server typically manages between 10 and 1,000 tablets. Each tablet size averages about 100–200 MB and Each BigTable consists of multiple tablets. Each tablet contains all the data of a group of rows. A newly created table consists of one tablet. As it grows it is dynamically broken into multiple tablets. This allows the system to automatically scale in a horizontal fashion.

--

--