Transitioning from traditional DW to Big data

This is some text inside of a div block.
This is some text inside of a div block.

Challenge

  • Regular challenges with current traditional Data warehouse
  • Cannot scale anymore
  • Maintainability
  • Cost-intensive
  • Single point of failure
  • With the increase of digitalization everywhere, the amount of data varied & multiplied started seeing the challenges with 3V’s
  • Volume
  • Velocity
  • Variety
  • Way behind meeting the customer’s demands
  • Not many options to enhance & make things more functional

Processing then

  • Store data in huge SQL databases
  • Complex SQL’s & Stored procedures to process data
  • High-performance enterprise servers to process smaller amounts of data
  • Manual testing
  • Aging monolithic solution

Processing now

  • Store data in fault-tolerant distributed storage
  • High-level languages to process data
  • Economically distributed clusters to process huge amounts of data
  • Fully functional and unit tested
  • New age modular solution

Solution

  • Leverage Big data stack to perform regular ETL process without hampering the existing platform
  • Use Sqoop to fetch data from the existing relational data warehouse into HDFS
  • Use Hive to normalize/de-normalize the data fetched as per the application’s requirement
  • Apache spark to process the data in memory and return the results back to HDFS
  • Export the results back to the existing relational data warehouse
  • All this regular ETL process has been simplified and well orchestrated using NIFI data pipelines to provide a seamless automated operational experience
  • Troubleshooting made easy even at production levels

Deployment Infrastructure

  • Cloudera’s CDH
  • Apache NiFi
  • Sqoop
  • Hive
  • Spark

Business impact

  • The operational cost of infra is significantly reduced
  • Reduced a day’s effort to 2-3 hours
  • Aided to create new products/applications gaining potential customer’s interest
  • Opened a gateway to perform analytics & machine learning at scale to derive key business insights