What is a Database?
A database is a structured collection of information or data that is typically stored electronically in a computer system. The management of the database is usually done by a Database Management System (DBMS). The combination of the data and the DBMS, along with the related applications, is commonly referred to as a database system, which is often shortened to just database.
What is DBMS?
A database is a collection of data that requires a comprehensive database software program, known as a Database Management System (DBMS), to function. Essentially, a DBMS serves as an interface between the database and its end-users or programs, allowing users to retrieve, update, and manage how the information is organized and optimized. Additionally, a DBMS enables oversight and control of databases, enabling a variety of administrative operations such as performance monitoring, tuning, backup, and recovery.
Components
Here are some commonly found components that are present in different databases:
Schema
A schema is responsible for defining the structure of a data storage system and specifying the types of data that can be stored in it. Schemas can be strictly enforced throughout the entire database, enforced loosely in some areas, or not enforced at all.
Table
Each table can have anywhere from two to over a hundred columns, depending on the type of information it contains, similar to a spreadsheet.
Column
A database column contains a single data value of a particular type for each row. The data value can be text, number, enum, timestamp, etc.
Row
Data in a table is recorded in rows. There can be thousands or millions of rows in a table having any particular information.
Types
Below are different types of databases:
Challenges
Here's a revised version of the text to make it clearer and free of spelling, grammar, and punctuation errors:
Running databases at scale can pose some common challenges that need to be addressed.
Absorbing significant increases in data volume: The explosion of data from sensors, connected machines, and other sources is increasing rapidly.
Ensuring data security: Data breaches are becoming increasingly common, making it more important than ever to prioritize both data security and user accessibility.
Keeping up with demand: For timely decision-making and seizing new opportunities, companies require real-time access to their data.
Managing and maintaining the database and infrastructure: As data volumes increase and databases become more complex, companies face the expense of hiring additional talent to manage them.
Removing scalability limits: A business needs to grow to survive, and its data management must grow with it. However, predicting the required capacity of on-premises databases can be challenging.
Ensuring data residency, data sovereignty, or latency requirements: Some organizations have use cases that are better suited to run on-premises. In those cases, engineered systems that are pre-configured and pre-optimized for running the database are ideal..
SQL databases
A SQL or relational database is a collection of items that are related to each other and organized into tables consisting of columns and rows. These tables hold the information about the objects to be represented in the database. Each column in a table holds a specific type of data, and each field stores the actual value of an attribute. The rows in the table represent a collection of related values of one object or entity.
Each row in a table can be identified uniquely with a primary key, and the rows among multiple tables can be related to each other using foreign keys. This data can be accessed in many different ways without the need to reorganize the database tables themselves. SQL databases usually follow the ACID consistency model.
Materialized views
A materialized view is a pre-computed dataset that is derived from a query specification and stored for later use. The data is pre-computed, which makes querying a materialized view faster than executing a query against the base table of the view. This performance difference is significant when a query is run frequently or is complex.
Materialized views enable data subsetting, which improves the performance of complex queries that run on large datasets and reduces network loads. Though there are other uses of materialized views, they are primarily used for performance and replication purposes.
N+1 query problem
The N+1 query problem occurs when the data access layer executes additional SQL statements (N) to retrieve the same data that could have been obtained when executing the primary SQL query. The greater the value of N, the more queries will be executed, leading to a performance impact.
This problem is commonly observed in GraphQL and ORM (Object-Relational Mapping) tools. It can be solved by optimizing the SQL query or by using a dataloader that batches consecutive requests and makes a single data request under the hood.
Advantages
Let’s look at some advantages of using relational databases:
Simple and accurate
Accessibility
Data consistency
Flexibility
Disadvantages
Below are the disadvantages of relational databases:
Expensive to maintain
Difficult schema evolution
Performance hits (join, denormalization, etc.)
Difficult to scale due to poor horizontal scalability
Examples
Here are some commonly used relational databases:
NoSQL databases
NoSQL refers to a category of databases that do not primarily use SQL as their data access language. These databases are also known as non-relational databases. Unlike relational databases, NoSQL databases do not require data to follow a predefined schema. NoSQL databases follow BASE consistency model.
Below are different types of NoSQL databases:
Document
A document database, also known as a document-oriented database or a document store, is a versatile database that stores information in the form of documents. These databases can be used for transactional and analytical applications.
Advantages
Intuitive and flexible
Easy horizontal scaling
Schemaless
Disadvantages
Schemaless
Non-relational
Examples
Key-value
One of the simplest types of NoSQL databases, key-value databases save data as a group of key-value pairs made up of two data items each. They’re also sometimes referred to as a key-value store.
Advantages
Simple and performant
Highly scalable for high volumes of traffic
Session management
Optimized lookups
Disadvantages
Basic CRUD
Values can’t be filtered
Lacks indexing and scanning capabilities
Not optimized for complex queries
Examples
Graph
A graph database is a type of NoSQL database that uses graph structures to store and represent data. Instead of tables or documents, it uses nodes, edges, and properties to create relationships between data items.
The nodes represent the data items, while the edges represent the relationships between them. The relationships allow data in the store to be linked together directly and can be retrieved with a single operation in many cases.
Advantages
Query speed
Agile and flexible
Explicit data representation
Disadvantages
Complex
No standardized query language
Use cases
Fraud detection
Recommendation engines
Social networks
Network mapping
Examples
Time series
A time-series database is a database optimized for time-stamped, or time series, data.
Advantages
Fast insertion and retrieval
Efficient data storage
Use cases
IoT data
Metrics analysis
Application monitoring
Understand financial trends
Examples
Wide column
Wide column databases, also known as wide column stores, are schema-agnostic. Data is stored in column families, rather than in rows and columns.
Advantages
Highly scalable, can handle petabytes of data
Ideal for real-time big data applications
Disadvantages
Expensive
Increased write time
Use cases
Business analytics
Attribute-based data storage
Examples
Multi-model
Multi-model databases combine different database models (i.e. relational, graph, key-value, document, etc.) into a single, integrated backend. This means they can accommodate various data types, indexes, queries, and store data in more than one model.
Advantages
Flexibility
Suitable for complex projects
Data consistent
Disadvantages
Complex
Less mature
Examples
That's all folks.
In the next blog, I will try to cover more about SQL V/s NoSQL Databases.
Feel free to comment on how you like my blog on the system design series or shoot me an email at connect@nandan.dev If you have any queries I will try to answer them.
You can also visit my website to read some of the articles at https://nandan.dev/
Stay tuned & connect with me on my social media channels. Subscribe to my newsletter to get regular updates on my upcoming posts.
If you're interested in learning about system design, you can join Design Guru's course on System Design Fundamentals.