Regular challenges with current traditional Data warehouse
Cannot scale anymore
Maintainability
Cost-intensive
Single point of failure
With the increase of digitalization everywhere, the amount of data varied & multiplied started seeing the challenges with 3V’s
Volume
Velocity
Variety
Way behind meeting the customer’s demands
Not many options to enhance & make things more functional
Processing then
Store data in huge SQL databases
Complex SQL’s & Stored procedures to process data
High-performance enterprise servers to process smaller amounts of data
Manual testing
Aging monolithic solution
Processing now
Store data in fault-tolerant distributed storage
High-level languages to process data
Economically distributed clusters to process huge amounts of data
Fully functional and unit tested
New age modular solution
Solution
Leverage Big data stack to perform regular ETL process without hampering the existing platform
Use Sqoop to fetch data from the existing relational data warehouse into HDFS
Use Hive to normalize/de-normalize the data fetched as per the application’s requirement
Apache spark to process the data in memory and return the results back to HDFS
Export the results back to the existing relational data warehouse
All this regular ETL process has been simplified and well orchestrated using NIFI data pipelines to provide a seamless automated operational experience
Troubleshooting made easy even at production levels
Deployment Infrastructure
Cloudera’s CDH
Apache NiFi
Sqoop
Hive
Spark
Business impact
The operational cost of infra is significantly reduced
Reduced a day’s effort to 2-3 hours
Aided to create new products/applications gaining potential customer’s interest
Opened a gateway to perform analytics & machine learning at scale to derive key business insights