What is data warehouse?
Data warehouse is a relational database designed for analytical needs. It is the act of organizing and storing data in a way to make its retrieval efficient and insightful. It is also called as the process of transforming data into information. It functions based on OLAP (Online analytical processing). It is a central location where consolidated data from multiple locations (databases) are stored.
Data warehouse is a combination of data from multiple sources into on database which can be used for reporting and analysis.
Features of Data Warehouse:
Subject oriented
Integrated
Time variant
Non-volatile
Architecture of Data warehouse:
We are organizing data and storing data in a warehouse in a way that you can access data at a later point and that access should be meaningful data, it should not be same as the data in a database.
Our Understanding: –
We have our data in a different data source and that data is transferred into data warehouse.
Now how can we transfer our data into data warehouse, so, there is an intermediate layer, we can call it as a staging area.
There is a staging database which is a temporary storage.
The act of getting data from source to staging database or staging area is done by ETL process (Extract, Transform, Load).
Once it comes to the temporary storage again the process of ETL will transform data into data warehouse.
Once the data has been entered into the data warehouse, it should be divided into:
Raw data – rows or columns or actual data we will transfer, so it is basically junk of data.
Meta data – Meta data is the data about raw data.
Once the data is in our data warehouse your end user can use this data to perform analysis by running queries.
The act of performing queries on data warehouse is called Online Analytical Processing.
Data marts: – Data marts are not entirely different they are just a part of data warehouse end users can perform queries either on data warehouse or data marts. It also includes the data about a particular domain.
Data Warehouse Concepts: –
OLAP (Online Analytical Processing) –
OLAP is a flexible way for you to make complicated analysis of multidimensional data.
DWH (data warehouse) is modeled on the concept of OLAP. Databases are modeled on the concept of OLTP (Online Transaction Processing).
OLTP systems use data stored in the form of two-dimensional tables, with rows & columns. Eg. Excel sheets (which includes rows & columns).
Advantages of OLAP over OLTP –
Opens new views of looking at data.
Supports filtering/sorting of data.
Data can be refined.
Dimensions: –
The tables that describe the dimensions involved are called Dimension tables.
Dividing a data warehouse project into dimensions provides structured information for analysis and reporting
Facts and measures: –
A fact is a measure that can be summed, averaged, or manipulated.
A fact tables contains 2 kinds of data – a dimension key and measure.
Every dimensional table is linked to a fact table.
Schemas: –
A schema gives the logical description of the entire database.
It gives details about constraints placed on the tables, key values present & how the key values are linked between the different tables.
A database uses relational model, primary and foreign key concept, we use entity relational model, while a data warehouse uses star, snowflake, and fact constellation schema.
Types of schemas: –
Star schema: – A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence. It is called star schema because the fact table sits at the center of the logical diagram and small dimensional tables branch off to form the points of the star. Each star schema database only has a single fact table.
Snowflake schema: – The Snowflake schema is a variant of the star schema. Here, Centralized fact table is connected to the multiple dimensions. Dimensions are present in a normalized form in multiple related tables.
Galaxy schema: – Galaxy schema is also known as fact constellation schema. It contains more than one fact tables and these multiple fact tables share the same dimension tables. Dimensions which are shared are called conformed dimensions. The arrangement of fact tables and dimension tables looks like a collection of stars in galaxy schema model. This schema is difficult to maintain due to its complexity.
This brings us to the conclusion about Data warehousing concepts. This article taught us about What is data warehouse, Its features and architecture, data warehouse concepts like OLAP, facts and measures, dimensions, schemas, and its types.
Now we have a better idea about data warehouse concepts.
Please share your thoughts and suggestions in the space below, and I’ll do my best to respond to all of them as time allows.
Keep Learning…!!!!!
Leave A Comment