What is data virtualization (DV)
The main motto for Data Virtualization is: recipients, tools and applications are independent of the physical and logical data structures..
The traditional Business Intelligence, Data Warehouse and ETL apporach has been always linked to issues with excessive data growth, data silos, moving data from one pile to another, complexity of the data and IT development process and latency of the data (meaning in most cases daily DWH loads).
Data virtualization from the other hand is oriented on real time (or near real time) 'light' processing, big data techniques, getting most of external data sources, in many cases that means data in the cloud. DV development process is light and not very resource and cost consuming. A Data Virtualization platform is a bridge between IT and the business users and in ideal scenario can be used in a self-service mode by the business users.
What data virtualization means in practice ?
The key features of a data virtualization platform:
- Canonical business views of the data - meaningful business representation of the data, high-level entities and relationships between them.
- Pre-integrated information for discovery and self-service, fast and near real-time access, flexible and accurate.
- Performance is expected independent of the type of system, location, technology.
- Easily handling multiple data sources and systems - different query languages, data models, security approaches.
- A quick answer to dynamically changing business conditions like acquisitions, mergers, etc.
Benefits of data virtualization
The key DV benefits:
- Information quality - data silos integrated easily, including social, unstructured, cloud, web, big data
- More agile and flexible operation - adapt easily to changing environment thus lower the costs of maintenance
- First results visible quickly - a POC can be delivered in days, projects in weeks and the total ROI usually doesn't exceed half a year
Data virtualization architecture
The most common data virtualization layers include data source connection, intermediate layer processing, bringing data to a canonical form and making it available for reporting through a publishing layer.
Data virtualization vs ETL
Extract,Transform,Load (ETL) | Data Virtualization (DV) |
---|---|
|
|
Although the two concepts are different, in real-life scenarios ETL and Data Virtualization are complementary and most organization will benefit from having both solutions in house.