Page 1 of 1

Problem with Auto partitioning scheme

Posted: Tue Oct 11, 2011 2:50 pm
by vivekgadwal
Hello all,

As part of upgrading our IIS to 8.5, we are testing all of our jobs parallelling with production (which is on 8.0.1).

Most of our transformation jobs have a certain logic to get keys from its parent table ... kind of, like this:

Code: Select all

parent table unld [Dataset] --- Funnel --- Delta load of parent table [Dataset]
                                 |
                                 |
     Sort on natural key and pop info {a number to indicate load} [sort stage - Auto partition]
                                 |
                                 |
      Remove Dups on natural key and retain the first value [Rem Dup stage - Auto partition]
                                 |
                                 |
      Transformer with some Trim functions [Transformer stage]
                                 |
                                 | (right link)
input dataset (left link) --- Left Join [Join stage] --- Output ... more logic
The problem is, the auto partition is not functioning properly in this case, for some jobs, which is causing more number of rows to be produced at the join. So, in essence, the left outer join is getting messed up - i.e. more number of rows are being produced on the output of the join stage that the input. Also, the keys are not being looked up properly (resulting in ZEROES) which causes the process to abort later.

Workaround
If I add a Hash partition on the Natural key in the Sort stage, the job is producing the desired result. Also, these jobs are in production (IIS 8.0.1 version) and it works really well. However, this is not a desirable workaround, as it involves changing the code at a lot of places and a lot of jobs.

Question(s)
I want to find out if anybody else experienced this issue. I am really curious to find out why the code behavior in 8.5 is different from 8.0.1 as one would expect the code, especially related to partitioning, to be the same between versions. Please let me know if I need to provide any more information in order to find a solution to this problem.

Posted: Wed Oct 12, 2011 10:04 pm
by prakashdasika
Did you check the config fie in 8.5 and 8.0.1? I think the number of nodes might be different. This will cause inconsistent results as you mentioned.

The hash partitioning you added is the satndard procedure during development, e.s.p for stages like Sort, Join, Remove Duplicate etc..

In this case case you can do hash partition on Sort stage and propagate it to subsequent stages until a need arises to change it or encounter a sequential operator.

Posted: Thu Oct 13, 2011 5:28 am
by vivekgadwal
Thanks for your reply Prakash.
prakashdasika wrote:Did you check the config fie in 8.5 and 8.0.1? I think the number of nodes might be different. This will cause inconsistent results as you mentioned.
I had a new project created in 8.5 and moved the 8.0.1 config file into it. Of course, I made the changes to it, so that it has the right server name stuff to it. We even had the exact same directory structures as Prod 8.0.1 created on the new server.

Can you please explain why this would cause inconsistent results, if both the config files are 4 node and except for the server name, everything else is the same?
prakashdasika wrote:The hash partitioning you added is the satndard procedure during development, e.s.p for stages like Sort, Join, Remove Duplicate etc..

In this case case you can do hash partition on Sort stage and propagate it to subsequent stages until a need arises to change it or encounter a sequential operator.
I totally agree with you that the Hash partitioning is a standard procedure and I do that too. However, a lot of this code has been developed by people in 2007-08. It also is working well in Prod 8.0.1 at this moment. I already did the workaround you proposed and it is working as expected, but I am very surprised that one job behaves differently than the others (we have a lot of jobs with a similar logic)!