Slow running Job in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
subbulakshmi_su
Participant
Posts: 1
Joined: Sat Jun 26, 2010 3:16 am

Slow running Job in Datastage

Post by subbulakshmi_su »

Hi,

I have a requirement where in a source file which has to be loaded into the oracle table. The table contains 7 primary keys. I need to check whether there is any duplicate data present in the promary key combination and if that is the case, it should be rejected to the error file.

I have designed a job like i have used Sort stage after sequential stage ( to read the source file) to sort all the primary keys and set flag in the transformer using stage variables and constraint will filter the duplicate data according to the flag set in the stage variable.


Since i am using Sort stage for sorting out 7 fields, its taking more time to sort for 24 lak records (nearly 3 hours). Can any suggest me to improve the performance?
antonyraj.deva
Premium Member
Premium Member
Posts: 138
Joined: Wed Jul 16, 2008 9:51 pm
Location: Kolkata

Post by antonyraj.deva »

Hi Subbulakshmi,

Welcome aboard. Sorting on 7 keys is for sure going to take sometime.

How many nodes are there in your environment and what partitioning method are you using?
TONY
ETL Manager
Infotrellis India

"Do what you can, with what you have, from where you are and to the best of your abilities."
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The post is marked as a Server issue, even though we're in the PX forum, so I'm guessing there's no nodes involved here. Once confirmed, I'll move the post. :wink:

The Server sort stage is not known for its speed. A better alternative would be any kind of command line sort option and you could leverage that using the Filter option in the Sequential File stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
antonyraj.deva
Premium Member
Premium Member
Posts: 138
Joined: Wed Jul 16, 2008 9:51 pm
Location: Kolkata

Post by antonyraj.deva »

Didn't notice the "Server" bit Craig. :oops:
TONY
ETL Manager
Infotrellis India

"Do what you can, with what you have, from where you are and to the best of your abilities."
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the UNIX sort command or a third-party sort utility such as CoSort or SyncSort.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

It will be better to sort and locate duplicates prior to coming into DataStage.

This is to avoid committing records only to find a duplicate down the line.
Post Reply