Aggregator + Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Aggregator + Performance

Post by dsuser_cai »

Hi

I have the following data:

Code: Select all

Loan_Id   Amount  Produts_Id  Store_Id
1             1000.9   55              12
1              .082     55              12
2             2000     55               32
3              0.8776  55              05
2             9.8        55              32
3            1000      55               05
the total number of records is 28.4 million.
No Of Nodes: 2
stages used:

Code: Select all

 oracle---->agg (group by loan_id, product_id, store_id, sum(amount) partition was set to auto ----> sequential file  
the total run time for this job was 11.06 minutes.

now i modified the job with the following:

Code: Select all

 oracle---->sort_stage (order by loan_id)--->agg (group by loan_id, product_id, store_id, sum(amount) partition was set to Hash (loan_id, products_id, store_id) ----> sequential file  
and the run time was 11.01 minutes.
i do not see any significat improvement in the performance. can somebody help me where im wrong.

This is the only job that is running in the environment, no other jobs or applications are running in the environment.
Thanks
Karthick
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

The oracle stage has a simple select query.
Thanks
Karthick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why not just do all that in Oracle? And why would you expect adding the overhead of a sort stage sorting on one field to improve overall performance? If you want to go that route, you'd need to sort by all three grouping columns (not just the first one) and somehow make sure the Aggregator knew you'd done that. It looks like PX would need the optional Method to be set to Sort for that. Make sure you read the help text for Method when you're in there, and understand the prerequisites.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

Yeah i tried in Oracle, and the entire run time was only 5.02 minutes. thanks criag.
Thanks
Karthick
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You probably got the wrong answers using Auto partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mgendy
Premium Member
Premium Member
Posts: 44
Joined: Thu Sep 10, 2009 5:30 am
Contact:

Post by mgendy »

You probably got the wrong answers using Auto partitioning
first i agree with what ray said , you may get wrong results using auto partitioning while you are using Hash aggregator , that propably happen with massive number of data

second , why do you expected more performance when using sorting and partitioning explicity ? it doesn't happend in all times because auto partitioning method by default try to fine the best partitioning method to use to get the most performance , so there are many casese 'like your case' while change the method of partitioning and sorting data doesn't imporve performance

Tird , just reminder to change agregator method to sort 'from agregator stage --> options --> mthod = sort' while using EXPLICIT sort , and partitioning , that may imporve the performance in your case , because sorting data and using hash partition method , does not make any effect to hash agregator type

Regards
Mohmmed Elgendy
Senior System Analyst
Data IntegrationTeam
Etisalat Egypt
+20 1118511161
Post Reply