Tuning Fact Table Process

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Tuning Fact Table Process

Post by itcconsulting »

I have a fact table loading process that is not performing to my expectation. It's taking my ETL process 33 minutes to load 170,000 Rows (3 MB) of data (approx. 85 Rows/Sec). The data is extracted and loaded into tables in the same Oracle Database (using DRS stage). This process populates data into a fact table. The process has 4 major transformations. Each transformation has 5 reference lookups (used to attain to correct surrogate keys). Some lookup are performed against hashed files while other are database tables.

Performance was much worst that this. The following tuning steps have been done to get me to this point.

1. Addition of IPC stages to force the process to use all available processors

2. Tuning the data extraction SQL and the lookup SQL.

3. Add indexes to the lookup tables

4. Tune the hashed files by adjusting the Minimum Modulus

5. Tuned Array and Transaction Size

In addition to these techniques, I also replace the source and target tables with Sequential files and the performance outcome marginally improved.

Below are some particulars for my environment:
- DataStage Server Edition 7.5.1A (Located on Server A)
- Server A (Solaris OS, 4 CPU, 2GB)
- Oracle 10G DataBase (Located on Server B)
- Server B (Solaris OS, 12 CPU, 24GB, Holds 15 DBs)

Any idea on how I can improve me performance?
Byron
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How are you loading your fact table now. Are you collecting all your data into a flat file in one job after all the lookups and transformations and then in a second job loading into your table or everything in one job without an intermediate sequential file stage?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Are you watching prstat on both servers to see how your cpu utilization is doing?

Why just some hashed lookups, not all hashed lookups?

What about multiple job instances?

How about DIRECT path bulk loading inserts and using a "bulk update" script (DIRECT path load updates into a work table, then use a parallel DML enabled correlated update statement)

How about just using OCI-->XFM-->SEQ to Extract your source data and see the maximum speed? If putting SEQ's as the output stage and very little performance gain, you confirm that the OCI lookups or the OCI extract is the limiter.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I know this is off topic but Ken, someday you have to tell us how to do the co-related updates aka bulk updates.
Thanks,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You bulk load the updates into a work table and issue a MERGE. For Oracle, anywho.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats it. Is it that simple :? Well, this way, MERGE can also be used for Inserts right :?:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Post by itcconsulting »

Thanks the replys.

Monitoring the process with prstat, mpstat, and sar simultaneously. No major bottlenecks in memory, or CPU utilization. In fact the cpu utilization peaked at 81% on all 4 cpus during processing.

For SCD II, some of the lookups require the date range criteria. This is not accomodated in hashed file and the performance was terrible when I attempted it with the UV Stage.

Prior to making my initial posting, I created a simple job to measure the base performance for my envoronmement.

DRS ------> IPC----->Xform ------> IPC ------>DRS

The performance was 1500 Rows/Sec. Considering the performance metrics that I have read from other posts that is slow as well.

I have not attempted the bulk loaders because it seems like my degredation is not due to the extraction or loading events.
Byron
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Post by itcconsulting »

DSguru2B wrote:How are you loading your fact table now. Are you collecting all your data into a flat file in one job after all the lookups and transformations and then in a second job loading into your table or everything in one job without an intermediate sequential file stage?

Yes all in 1 job.
Byron
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

itcconsulting wrote:For SCD II, some of the lookups require the date range criteria. This is not accomodated in hashed file and the performance was terrible when I attempted it with the UV Stage.
Sounds like you missed the need to create an index over the key fields so that the UV lookup isn't dog slow. Search the forum for the CREATE.INDEX and BUILD.INDEX syntax for this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Post by itcconsulting »

Below is the output for one of my UV tables. If data existed in the table before creating the index, is an index rebuild required? If so, what is the syntax?

>LIST.INDEX HASH_D_JRNL_SOURCE_LKP2 ALL
Alternate Key Index Summary for file HASH_D_JRNL_SOURCE_LKP2
File........... HASH_D_JRNL_SOURCE_LKP2
Indices........ 1 (0 A-type, 0 C-type, 0 D-type, 0 I-type, 1 SQL, 0 S-type)
Index Updates.. Enabled, No updates pending

Index name Type Build Nulls In DICT S/M Just Unique Field num/I-type
JRNL_SOURCE_IDX SQL Not Reqd Yes Yes S R N ConcatKey;Eff_St
art_Dt;Eff_End_D
t;@1:@TM:@2:@TM:
@3
Byron
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As far as I know, you must always BUILD an index after CREATE'ing it. And, as noted, the syntax can be found by searching the forum - this is one such example.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It would have been nice to encase your LIST.INDEX between Code tags - in fact, why not edit your previous post to make it so?

BUILD.INDEX is required if there are data in a hashed file on which an index is created. BUILD.INDEX is not required if the hashed file is empty when the index is created.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Post by itcconsulting »

ray.wurlod wrote:It would have been nice to encase your LIST.INDEX between Code tags - in fact, why not edit your previous post to make it so?

BUILD.INDEX is required if there are data in a hashed file on which an index is created. BUILD.INDEX is not required if the hashed file is empty when the index is created.
Thanks Ray. Here is the LIST.INDEX for one of my UV Table indexes.

Code: Select all

 
>LIST.INDEX HASH_D_JRNL_SOURCE_LKP2 ALL
Alternate Key Index Summary for file HASH_D_JRNL_SOURCE_LKP2
File........... HASH_D_JRNL_SOURCE_LKP2
Indices........ 1 (0 A-type, 0 C-type, 0 D-type, 0 I-type, 1 SQL, 0 S-type)
Index Updates.. Enabled, No updates pending

Index name      Type  Build    Nulls  In DICT  S/M  Just Unique Field num/I-type
JRNL_SOURCE_IDX  SQL  Not Reqd  Yes    Yes      S    R     N    ConcatKey;Eff_St
                                                                art_Dt;Eff_End_D
                                                                t;@1:@TM:@2:@TM:
                                                                @3
When I attempt to rebuild the index, I get the follow message.

Code: Select all

>BUILD.INDEX  HASH_D_JRNL_SOURCE_LKP2 JRNL_SOURCE_IDX;

JRNL_SOURCE_IDX; is not a secondary index field.
Am I using the wrong syntax for UV Table index?
Byron
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Byron - take away the trailing ";" semicolon.
itcconsulting
Premium Member
Premium Member
Posts: 26
Joined: Tue May 24, 2005 6:20 am
Location: Tallahassee

Post by itcconsulting »

ArndW wrote:Byron - take away the trailing ";" semicolon.
That works! Amazing how small characters can make a big difference. Thanks.
Byron
Post Reply