Quite often, while building the Data Integration Pipeline, Performance is a critical factor. The factors below are vital towards following the guidelines while working on ETL processing with Informatica PowerCenter.

The following items are to be considered during ETL DEV.,

Pre-Requisite Checks/Analysis
Basic Tuning Guidelines
Additional Tuning Practices

 

Tuning Approach

Pre-Requisite Checks/Analysis : 

Before we get into subjecting an ETL Mapping against Performance Improvements, below steps to be adopted.,

Deep Dive into the Mapping to gather Basic Info.

Complexity of the Mapping (# of SRCs/TGTs, Transformations, Technical Logic)
Design of the Mapping (End-End Flow, Single/Multiple Pipelines)
Whether Best Practices followed

 

Verify the As-Is Metrics of the Mapping

Data Volume (SRC/TGT)
Duration of the Job Completion
Throughput
Busy Percentage of the Threads (Reader/Writer/Transformation)
Collect Performance Statistics

 

Ensure the ETL Server/System is not the reason for processing slowness

Are there frequent Network Connectivity issues?
Does the ETL System/Server has required H/W Capabilities?
Does the ETL Metadata DB have Enough Space?
Whether the System has Accumulated Log/Cache files blocking Server space?
DBs Slow with READ/WRITE?

 

After ensuring the above Pre-requisites are taken care and bottlenecks identified, if the ETL DEV is identified as the Root cause for slowness, Tuning practices can be applied on the Mappings if we expect significant improvement to meet the SLAs/Other Business benefits.

 

 

Basic Tuning Guidelines

 

Basic Guidelines are listed below.,

 

 

Design Perspective

Bring relevant/required fields on subsequent Transformations
Perform Incremental Extracts to limit Processing
Use Informatica CDC Drivers to process only Changed Data
Filter Data as early in the Pipelines
Limit the Data via Equi-Joins (JNR) upfront before Left-Joins (JNR) on Large Tables

 

 

DB Perspective

Build Indexes (High Volume Tables on Frequently used Joins/Predicates)
Create DB Partitions (for Large Fact Tables)
Collect STATISTICS
DB performs Faster processing (Complex Transformation Logic) than ETL

 

 

Delegation Perspective

Use PDO if DB Server has appreciable Computing Abilities
If DB Server has High Workload, push Functions Logic to Informatica Transformations
If DB has difficulty with Aggregations/Sorting, use Informatica Transformations

 

 

Space Perspective

Have a Retention period for Log/Cache files
Increase SRT/AGG/JNR Cache Size and DTM Buffer Size

 

 

Transformations/Load Perspective

Sorted Input data before LKP/AGG/JNR Transformations
JNR with Master Source having Less Records/Distinct Values
Consider BULK Load/External Loaders for Data Dump (after removing Index)
Use LKP Persistent Cache for Re-use requirements
Datatype consistency helps ETL operating with SRT/AGG/JNR
Optimize LKPs by Looking up only relevant Data (Override Filters) instead of Entire table
Avoid LKP Override Sort for Small Tables
Use UPD Strategy Transformation (only if necessary), can go for Session Level Updates
If LKP on High Volume table causes Performance issue, consider JNR Transformation

 

 

Addl. Tuning Practices

 

 

Additional Tuning Practices are listed below.,

 

Use Informatica Partitions (Pass Through/Key Range/Hash Key etc.) if High Data Volume
Do not use SRC and TGT as Same DB Table. Do a SRC – File TGT. Then FILE – DB TGT
Do not perform all ETL Operations in 1 Mapping. Divide ETL works with Series of Mappings
Use Concurrent Workflow Exec setting to enable parallel loads with different Params
Process ETL in multiple batches (ex. 2 times a day) to release the Table post load
If Complex ETL logic causes slowness, use FILE as TGT. Then 1:1 Load from FILE-TGT DB
Monitor Storage Space (Logs), use ETL Automation to clear files by Frequency (Mly/Qly)

 

 

Conclusion

 

 

On a high level, below are the inferences.,

 

Tuning need not be performed on every ETL mapping. Only those ETL jobs that are pain points to meeting Data Extraction/Loads SLAs be considered as potential candidates for further investigations and tuning.
DB Query optimization also plays a crucial role with SQL Overrides when used.
Delegate load b/w DB and ETL Servers.
Optimize ETL Design by following the Best Practices.
Monitor Storage Space and Computing Abilities.
Consider deploying Informatica Nodes on a GRID for High Availability and Load Balancing.