Page 1 of 1

Job Design Issue

Posted: Sun Jan 10, 2010 10:17 am
by mgendy
hi,
I've a problem in designing a job which need expert opinion in how to design
the story is as below
simple i need to design daily job which which will read about 1 billion records may be increased to 1.5 billion record within few months, rank them with the transaction date , gets the maximum transaction date and the identifer primary key , now when i select the row data , then sort , then remove duplicates , the scratch space become full and the job abort , i tried to get more scratch space, but now we get the maximum , i tried to select the distinct primary identifier key and treansaction date , to minimize the number of scanned rows , but it tooks about one hour to start retrieving the data , then gets about .5 billion records then sort and remove duplicates , it success but takes long time ,, when select the primary key identifer and the maximum transaction date so here i do the ranking on database level it takes about 4 hours ,,, we use datastage 7.5.3 and teradata , please advice with the best method to work with huge data and ranking them ,,, when to wprk in database level , and when to work with data stage ,


Regards
Gendy

Posted: Sun Jan 10, 2010 1:35 pm
by ray.wurlod
There is no such thing as maximum scratch space - you can add file systems to your configuration file as much as you like. If you can justify having more disk, then more disk must be obtained. Somewhere, whether in the source database or in ETL processing, this work simply has to be done.