I have a question regarding partition methods and performance. I have simple job that extracts data from an oracle table (three columns) and use an aggregator stage to sum and group and then load into an oracle table (i create the target table).
Code: Select all
oracle_stage----->agg_stage---->oracle_stage
Colum names: amount, new_seq, old_seq.
total rows: 16Million.
Did not use sort stage but used the sort option in the Aggregator stage (sorted by new_seq column.)
this is the run time for the job with different partition methods:
Hash Partition: 12 minutes
Round Robin: 10minutes, 44 seconds
Modulus: 11 minutes, 25 seconds.
Also for all the three times I got a warning message:
Aggregator_stg: Hash table has grown to xxxx entries.
I tried to check the forum and got that (viewtopic.php?t=126271&highlight=Hash+t ... s+grown+to)
aggregation is done based on hash table by default -> can anybody explain me about this warning.
Thanks in advance