Job Design Issue
Posted: Sun Jan 10, 2010 10:17 am
hi,
I've a problem in designing a job which need expert opinion in how to design
the story is as below
simple i need to design daily job which which will read about 1 billion records may be increased to 1.5 billion record within few months, rank them with the transaction date , gets the maximum transaction date and the identifer primary key , now when i select the row data , then sort , then remove duplicates , the scratch space become full and the job abort , i tried to get more scratch space, but now we get the maximum , i tried to select the distinct primary identifier key and treansaction date , to minimize the number of scanned rows , but it tooks about one hour to start retrieving the data , then gets about .5 billion records then sort and remove duplicates , it success but takes long time ,, when select the primary key identifer and the maximum transaction date so here i do the ranking on database level it takes about 4 hours ,,, we use datastage 7.5.3 and teradata , please advice with the best method to work with huge data and ranking them ,,, when to wprk in database level , and when to work with data stage ,
Regards
Gendy
I've a problem in designing a job which need expert opinion in how to design
the story is as below
simple i need to design daily job which which will read about 1 billion records may be increased to 1.5 billion record within few months, rank them with the transaction date , gets the maximum transaction date and the identifer primary key , now when i select the row data , then sort , then remove duplicates , the scratch space become full and the job abort , i tried to get more scratch space, but now we get the maximum , i tried to select the distinct primary identifier key and treansaction date , to minimize the number of scanned rows , but it tooks about one hour to start retrieving the data , then gets about .5 billion records then sort and remove duplicates , it success but takes long time ,, when select the primary key identifer and the maximum transaction date so here i do the ranking on database level it takes about 4 hours ,,, we use datastage 7.5.3 and teradata , please advice with the best method to work with huge data and ranking them ,,, when to wprk in database level , and when to work with data stage ,
Regards
Gendy