Tuning Fact Table Process

itcconsulting · Post by **itcconsulting** » Mon Feb 12, 2007 1:43 pm

I have a fact table loading process that is not performing to my expectation. It's taking my ETL process 33 minutes to load 170,000 Rows (3 MB) of data (approx. 85 Rows/Sec). The data is extracted and loaded into tables in the same Oracle Database (using DRS stage). This process populates data into a fact table. The process has 4 major transformations. Each transformation has 5 reference lookups (used to attain to correct surrogate keys). Some lookup are performed against hashed files while other are database tables.

Performance was much worst that this. The following tuning steps have been done to get me to this point.

1. Addition of IPC stages to force the process to use all available processors

2. Tuning the data extraction SQL and the lookup SQL.

3. Add indexes to the lookup tables

4. Tune the hashed files by adjusting the Minimum Modulus

5. Tuned Array and Transaction Size

In addition to these techniques, I also replace the source and target tables with Sequential files and the performance outcome marginally improved.

Below are some particulars for my environment:
- DataStage Server Edition 7.5.1A (Located on Server A)
- Server A (Solaris OS, 4 CPU, 2GB)
- Oracle 10G DataBase (Located on Server B)
- Server B (Solaris OS, 12 CPU, 24GB, Holds 15 DBs)

Any idea on how I can improve me performance?

DSguru2B · Post by **DSguru2B** » Mon Feb 12, 2007 1:48 pm

How are you loading your fact table now. Are you collecting all your data into a flat file in one job after all the lookups and transformations and then in a second job loading into your table or everything in one job without an intermediate sequential file stage?

kcbland · Post by **kcbland** » Mon Feb 12, 2007 1:52 pm

Are you watching prstat on both servers to see how your cpu utilization is doing?

Why just some hashed lookups, not all hashed lookups?

What about multiple job instances?

How about DIRECT path bulk loading inserts and using a "bulk update" script (DIRECT path load updates into a work table, then use a parallel DML enabled correlated update statement)

How about just using OCI-->XFM-->SEQ to Extract your source data and see the maximum speed? If putting SEQ's as the output stage and very little performance gain, you confirm that the OCI lookups or the OCI extract is the limiter.

DSguru2B · Post by **DSguru2B** » Mon Feb 12, 2007 1:56 pm

I know this is off topic but Ken, someday you have to tell us how to do the co-related updates aka bulk updates.
Thanks,

chulett · Post by **chulett** » Mon Feb 12, 2007 2:02 pm

You bulk load the updates into a work table and issue a MERGE. For Oracle, anywho.

DSguru2B · Post by **DSguru2B** » Mon Feb 12, 2007 2:05 pm

Thats it. Is it that simple

Well, this way, MERGE can also be used for Inserts right

itcconsulting · Post by **itcconsulting** » Mon Feb 12, 2007 2:12 pm

Thanks the replys.

Monitoring the process with prstat, mpstat, and sar simultaneously. No major bottlenecks in memory, or CPU utilization. In fact the cpu utilization peaked at 81% on all 4 cpus during processing.

For SCD II, some of the lookups require the date range criteria. This is not accomodated in hashed file and the performance was terrible when I attempted it with the UV Stage.

Prior to making my initial posting, I created a simple job to measure the base performance for my envoronmement.

DRS ------> IPC----->Xform ------> IPC ------>DRS

The performance was 1500 Rows/Sec. Considering the performance metrics that I have read from other posts that is slow as well.

I have not attempted the bulk loaders because it seems like my degredation is not due to the extraction or loading events.

itcconsulting · Post by **itcconsulting** » Mon Feb 12, 2007 2:13 pm

DSguru2B wrote:How are you loading your fact table now. Are you collecting all your data into a flat file in one job after all the lookups and transformations and then in a second job loading into your table or everything in one job without an intermediate sequential file stage?

Yes all in 1 job.

chulett · Post by **chulett** » Mon Feb 12, 2007 2:31 pm

itcconsulting wrote:For SCD II, some of the lookups require the date range criteria. This is not accomodated in hashed file and the performance was terrible when I attempted it with the UV Stage.

Sounds like you missed the need to create an index over the key fields so that the UV lookup isn't dog slow. Search the forum for the CREATE.INDEX and BUILD.INDEX syntax for this.

itcconsulting · Post by **itcconsulting** » Mon Feb 12, 2007 4:28 pm

Below is the output for one of my UV tables. If data existed in the table before creating the index, is an index rebuild required? If so, what is the syntax?

>LIST.INDEX HASH_D_JRNL_SOURCE_LKP2 ALL
Alternate Key Index Summary for file HASH_D_JRNL_SOURCE_LKP2
File........... HASH_D_JRNL_SOURCE_LKP2
Indices........ 1 (0 A-type, 0 C-type, 0 D-type, 0 I-type, 1 SQL, 0 S-type)
Index Updates.. Enabled, No updates pending

Index name Type Build Nulls In DICT S/M Just Unique Field num/I-type
JRNL_SOURCE_IDX SQL Not Reqd Yes Yes S R N ConcatKey;Eff_St
art_Dt;Eff_End_D
t;@1:@TM:@2:@TM:
@3

chulett · Post by **chulett** » Mon Feb 12, 2007 4:56 pm

As far as I know, you must always BUILD an index after CREATE'ing it. And, as noted, the syntax can be found by searching the forum - this is one such example.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 12, 2007 5:44 pm

It would have been nice to encase your LIST.INDEX between Code tags - in fact, why not edit your previous post to make it so?

BUILD.INDEX is required if there are data in a hashed file on which an index is created. BUILD.INDEX is not required if the hashed file is empty when the index is created.

itcconsulting · Post by **itcconsulting** » Tue Feb 13, 2007 7:59 am

ray.wurlod wrote:It would have been nice to encase your LIST.INDEX between Code tags - in fact, why not edit your previous post to make it so?

BUILD.INDEX is required if there are data in a hashed file on which an index is created. BUILD.INDEX is not required if the hashed file is empty when the index is created.

Thanks Ray. Here is the LIST.INDEX for one of my UV Table indexes.

Code: Select all

 
>LIST.INDEX HASH_D_JRNL_SOURCE_LKP2 ALL
Alternate Key Index Summary for file HASH_D_JRNL_SOURCE_LKP2
File........... HASH_D_JRNL_SOURCE_LKP2
Indices........ 1 (0 A-type, 0 C-type, 0 D-type, 0 I-type, 1 SQL, 0 S-type)
Index Updates.. Enabled, No updates pending

Index name      Type  Build    Nulls  In DICT  S/M  Just Unique Field num/I-type
JRNL_SOURCE_IDX  SQL  Not Reqd  Yes    Yes      S    R     N    ConcatKey;Eff_St
                                                                art_Dt;Eff_End_D
                                                                t;@1:@TM:@2:@TM:
                                                                @3

When I attempt to rebuild the index, I get the follow message.

Code: Select all

>BUILD.INDEX  HASH_D_JRNL_SOURCE_LKP2 JRNL_SOURCE_IDX;

JRNL_SOURCE_IDX; is not a secondary index field.

Am I using the wrong syntax for UV Table index?

ArndW · Post by **ArndW** » Tue Feb 13, 2007 8:01 am

Byron - take away the trailing ";" semicolon.

itcconsulting · Post by **itcconsulting** » Tue Feb 13, 2007 8:32 am

ArndW wrote:Byron - take away the trailing ";" semicolon.

That works! Amazing how small characters can make a big difference. Thanks.