Problem with Auto partitioning scheme
Posted: Tue Oct 11, 2011 2:50 pm
Hello all,
As part of upgrading our IIS to 8.5, we are testing all of our jobs parallelling with production (which is on 8.0.1).
Most of our transformation jobs have a certain logic to get keys from its parent table ... kind of, like this:
The problem is, the auto partition is not functioning properly in this case, for some jobs, which is causing more number of rows to be produced at the join. So, in essence, the left outer join is getting messed up - i.e. more number of rows are being produced on the output of the join stage that the input. Also, the keys are not being looked up properly (resulting in ZEROES) which causes the process to abort later.
Workaround
If I add a Hash partition on the Natural key in the Sort stage, the job is producing the desired result. Also, these jobs are in production (IIS 8.0.1 version) and it works really well. However, this is not a desirable workaround, as it involves changing the code at a lot of places and a lot of jobs.
Question(s)
I want to find out if anybody else experienced this issue. I am really curious to find out why the code behavior in 8.5 is different from 8.0.1 as one would expect the code, especially related to partitioning, to be the same between versions. Please let me know if I need to provide any more information in order to find a solution to this problem.
As part of upgrading our IIS to 8.5, we are testing all of our jobs parallelling with production (which is on 8.0.1).
Most of our transformation jobs have a certain logic to get keys from its parent table ... kind of, like this:
Code: Select all
parent table unld [Dataset] --- Funnel --- Delta load of parent table [Dataset]
|
|
Sort on natural key and pop info {a number to indicate load} [sort stage - Auto partition]
|
|
Remove Dups on natural key and retain the first value [Rem Dup stage - Auto partition]
|
|
Transformer with some Trim functions [Transformer stage]
|
| (right link)
input dataset (left link) --- Left Join [Join stage] --- Output ... more logic
Workaround
If I add a Hash partition on the Natural key in the Sort stage, the job is producing the desired result. Also, these jobs are in production (IIS 8.0.1 version) and it works really well. However, this is not a desirable workaround, as it involves changing the code at a lot of places and a lot of jobs.
Question(s)
I want to find out if anybody else experienced this issue. I am really curious to find out why the code behavior in 8.5 is different from 8.0.1 as one would expect the code, especially related to partitioning, to be the same between versions. Please let me know if I need to provide any more information in order to find a solution to this problem.