Global Investment Bank

Establish a central data repository holding all banking enterprise balance sheet data that can be used for Risk related analytics and fed reporting.

Data Management layer to support Security, Governance, Standard, maintainability & self service capability.

Ability to perform data analytics and data science for Ad Hoc reporting.

Build a data warehouse on all Mortgage-Backed Securities loan level data by connecting with three agencies (Fannie Mae, Freddie Mac & Ginnie Mae), can be used for MBS investment exposure.


  • Central conformed data layer was created using Hive as a single view for all of business data.
  • Hierarchies, GL balances were stored as reference data.
  • Cloudera Navigator was used to store Data Lineage, data dictionary.
  • Sqoop was used for data ingestion
  • Data was layered to facilitate data governance and functional separation to improve data consistency


  • Spark
  • Sqoop
  • Hive
  • Cloudera Navigator
  • Tableau
  • AIrflow
  • Oozie


Centralized data hub with modern architecture using Cloudera Hadoop that can handle data expansion, governance, security and maintainability. Reduced data management cost by 60% providing self service capability.


Global Financial Leader

As you requested, please find below listed accomplishment note.

Business Problem: A number of SOX critical business processes were handled manually by multiple business teams to pull the data from multitude of sources, run data quality checks, data transformations, generate reconciliation and management review control check reports with varying frequencies such as weekly, monthly, quarterly and yearly.

They were often very tedious, error prone processes and involved lot of back and forth between the teams due to manual nature. Estimated effort was several hours by multiple team members on an on-going basis.

Solution: Worked with the key business stakeholders, captured the requirements and implemented end to end business process data analytics automation workflow solutions to handle complex, heterogenous and large volumes of data (structured and unstructured) processing, cleansing, blending, enrichment and preparation of key insights and reports by putting the automated data controls in place.

Technology Stack: Alteryx, SQL Server, Oracle, MongoDB, SharePoint On-Prem & Cloud, Microsoft office, Data Warehouse

Outcome: Automation resulted in removal of manual processes, bring efficiencies, automated data check controls for SOX/audit/regulatory needs, gain significant time reduction (~99%) and increased productivity for the business users, empowering them to take data-driven decisions with quicker turnaround times

Large Insurance Company

Refine or redesign the current Nifi flows to reduce node resource consumption and improve overall stability of the cluster.
Optimize performance to handle high volume screaming data in NIFI
Solution for security and Data in transit encryption best practices in Apache Nifi


  • Developed optimized Nifi workflow to prevent resource contention and development of best practices to be shared amongst Nifi developers
  • Redesigned current production workflow in a way that will work distributed across the clusters
  • Designed Nifi workflow encrypting traffic in transit to Apache Nifi
  • Prepared and applied HDF or CFM best practices configuring NYL system and Nifi for high performance data flows
  • Fixed numerous misconfigured processors polluting the nifi-app.log with error
  • Prepared, tested and handled detailed data transformation and tuning plan with best practice and optimization plan for next year

Telecommunications Company

Analysis and solution of customer HBASE and CDP 7.1.6 cluster to optimize performance
HBase management, performance tuning, Cluster & Component Configurations, Optimization Recommendation and Implementation


  • Authenticated and improved HBase Fault Tolerance for better customer experience
  • Prepared ETL Recommendations and best practice to follow
  • Prepared Software Recommendations and best practice to follow
  • Prepared Hbase optimized parameters with Cloudera Manager. All the details described in practical details describing all steps to install and evaluate.
  • Reviewed Hbase Configurations and history of changes and log files to prepare optimization plan
  • Reviewed Hbase recommendation for upgrades and versions to verify in matrix coexisting with other applications
  • Prepared, tested and handled detailed data processing tuning plan including all optimization steps to incorporate best practice

Telecommunications Company

Transitioning HDP 2.6.5 cluster to CDP Private Cloud Base 7.1.7 cluster
Upgrade scenarios, upgrade strategy, pre-upgrade, upgrade and post-upgrade task in details


  • Prepared and performed Upgrade and Migration Plan and architecture
  • Prepared and validated Migrating Streaming workloads from varied sources
  • Performed the pre-transition and following steps
  • Configured and optimized services in Cloudera Manager as well as managed CDP cluster to get better performance
  • Prepared detailed step-by-step instructions on upgrading services as Runbook
  • Overviewed all clusters and prepared documentation to ensure cluster is ready and meets all the criteria
  • Overviewed all products you are integrating with are certified to work with the HDP intermediate bits and CDP Private Cloud Base
  • Estimated impact of upgrading and prepared solution for integrating with the cloud platform