Hoping that using aggregator stage, performance is getting effect in the job. The job is running for 1hour for 1 million records. Near future we will get 5-10 million records.
dataset --> transformer-->aggregator stage
| Auto Partition
V
Input data is from a table--------------------> lookupstage ------> output stage
Details of the aggragator stage
grouping on two columns(Col_A,Col_B)
and
calculation on all the columns
Aggregation type=Caluculation
Column for Calculation=Col_C
Sum Output Column=Col_C
Column for Calculation=Col_D
Sum Output Column=Col_D
-
-
-
And so on - for 12 columns
Tried possibilities based on the forums.
1. Changed reference auto partition to entire partition, tested the job and same performance.
2. Sorting on grouping columns and tested the job-no improvement.
Additional Details:
Job type: parallel.
Version: 8.1
Configuration: One node configuration.
Joins/Lookups or Aggregations with large volumes could be much more efficient back on the database, if you can move some of the work back there (especially as you have 1 node).
HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,
thanks
Hi I have experience in parallel extender datastage I am ready to give/take help from other
hope we all help each other hand in hand
HI
1)I would like to always sort the data before the aggregator stg,
2) use join if possible
3) if ref data is more select the option sparse
4)Using Look up file sets give better performance results
5)properly using partitioning wrto stg and data req
like try using entire with look up stg,
thanks
Hi I have experience in parallel extender datastage I am ready to give/take help from other
hope we all help each other hand in hand