Bottleneck Advice
Posted: Tue Sep 28, 2004 6:54 am
Hi Guys
Just something I was pondering on while tuning a job. I have a SEQuential stage looking up against a HASH stage and writing to a different HASH stage. This needs to occur like this because I am using "different HASH stage" as lookup/reference for another transformer.
The SEQ stage has 19'000'000 records in it, 3 columns wide (char 16 [key], integer 10, decimal 4); this is how it comes from the DB.
The lookup HASH stage has 300'000 records in and is two columns wide (decimal 4 [key], char 3) (dynamic/type 30 HASH)
The output HASH stage I am writing is 3 columns wide and will have all 19'000'000 records in it as (char 16 [key], integer 10, char 3 (derived from the lookup) (static/type 10 HASH -- at the moment).
Unfortunately, I need to process all these records, so limiting the number of records to be written is not an option.
The problem I have is that to create the SEQ stage I was writing at 25'000 rows per second. The lookup HASH was created at a good speed. Now, when I do the matching between looklup HASH and SEQ stage, I am getting 6'000 rows per second. By tuning the output HASH stage I know I can improve performance, but at which stage do I realise that the bottleneck is resulting from reading the SEQ stage?
Just something I was pondering on while tuning a job. I have a SEQuential stage looking up against a HASH stage and writing to a different HASH stage. This needs to occur like this because I am using "different HASH stage" as lookup/reference for another transformer.
The SEQ stage has 19'000'000 records in it, 3 columns wide (char 16 [key], integer 10, decimal 4); this is how it comes from the DB.
The lookup HASH stage has 300'000 records in and is two columns wide (decimal 4 [key], char 3) (dynamic/type 30 HASH)
The output HASH stage I am writing is 3 columns wide and will have all 19'000'000 records in it as (char 16 [key], integer 10, char 3 (derived from the lookup) (static/type 10 HASH -- at the moment).
Unfortunately, I need to process all these records, so limiting the number of records to be written is not an option.
The problem I have is that to create the SEQ stage I was writing at 25'000 rows per second. The lookup HASH was created at a good speed. Now, when I do the matching between looklup HASH and SEQ stage, I am getting 6'000 rows per second. By tuning the output HASH stage I know I can improve performance, but at which stage do I realise that the bottleneck is resulting from reading the SEQ stage?