Job Design Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mgendy
Premium Member
Premium Member
Posts: 44
Joined: Thu Sep 10, 2009 5:30 am
Contact:

Job Design Issue

Post by mgendy »

hi,
I've a problem in designing a job which need expert opinion in how to design
the story is as below
simple i need to design daily job which which will read about 1 billion records may be increased to 1.5 billion record within few months, rank them with the transaction date , gets the maximum transaction date and the identifer primary key , now when i select the row data , then sort , then remove duplicates , the scratch space become full and the job abort , i tried to get more scratch space, but now we get the maximum , i tried to select the distinct primary identifier key and treansaction date , to minimize the number of scanned rows , but it tooks about one hour to start retrieving the data , then gets about .5 billion records then sort and remove duplicates , it success but takes long time ,, when select the primary key identifer and the maximum transaction date so here i do the ranking on database level it takes about 4 hours ,,, we use datastage 7.5.3 and teradata , please advice with the best method to work with huge data and ranking them ,,, when to wprk in database level , and when to work with data stage ,


Regards
Gendy
Mohmmed Elgendy
Senior System Analyst
Data IntegrationTeam
Etisalat Egypt
+20 1118511161
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no such thing as maximum scratch space - you can add file systems to your configuration file as much as you like. If you can justify having more disk, then more disk must be obtained. Somewhere, whether in the source database or in ETL processing, this work simply has to be done.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply