Partitions and Nodes configuration problem

Maveric · Post by **Maveric** » Tue Aug 21, 2007 10:59 pm

what stages are you using? Any join, lookup, aggregator Stages?

Maveric · Post by **Maveric** » Tue Aug 21, 2007 11:00 pm

The problem is clearly with partitioning. for join, lookup you need to hash partition on the keys mentioned in the stage.

lokesh_chopade · Post by **lokesh_chopade** » Wed Aug 22, 2007 1:59 am

The results are coming proper in different environment. So is it problem with environment or always data needs to be partitioned?

ag_ram · Post by **ag_ram** » Wed Aug 22, 2007 2:05 am

Data Always needs to be partitioned !

Raghava · Post by **Raghava** » Wed Aug 22, 2007 2:11 am

Please elobrate the problem.......

lokesh_chopade · Post by **lokesh_chopade** » Wed Aug 22, 2007 2:28 am

If data always needs to be partitioned then why its giving proper result in production where the data is very huge?

ArndW · Post by **ArndW** » Wed Aug 22, 2007 2:43 am

I'm not sure what the previous poster meant about data always having to be partitioned. In most cases the default partitioning and default configuration file works just fine and the developer is not forced to think about the partitioning with regards to performance. Only when doing such things as lookups and sorts does the designer have to understand the implications of partitioning.

The most common case is doing a lookup in a job and using more than 1 node in the configuration file. Unless the data is partitioned according to the lookup key or the designer explicitly specifies "entire" partitioning for the lookup link the result of the job will be wrong.

ArndW · Post by **ArndW** » Wed Aug 22, 2007 4:06 am

If your job delivers different results in a 1 node than in any multinode configuration you have made design mistakes. Most likely (as per my previous post) in lookup stages.

So, as per my previous post, do you have lookups? If so, is the reference link set to "entire" partitioning or have you ensured that you have partitioned both streams on the lookup key? (These are rhetorical questions, since I am sure that this is your problem)

lokesh_chopade · Post by **lokesh_chopade** » Wed Aug 22, 2007 5:22 am

I know the design for PX jobs are not partition base and hence the results are coming different with 4 nodes config file. But in production, even with 4 nodes, results are coming proper. In most of the places, the join stage is used.

There is no problem with lookup.

As in production we can't test same job running number of times, we tried in different environment where results are coming different. To avoid this, we tried with 1 node and result is proper.

So is this problem with environment?

Maveric · Post by **Maveric** » Wed Aug 22, 2007 5:32 am

If it is possible export the job from production and import it into testing environment and test it. See if there is any difference in the output.

balajisr · Post by **balajisr** » Wed Aug 22, 2007 5:48 am

It would be easy to know the cause if you could explain what is wrong or different.

Also, Is your development/test and production jobs are in sync?

ArndW · Post by **ArndW** » Wed Aug 22, 2007 5:52 am

lokesh_chopade wrote:...There is no problem with lookup. ...

Humour me please, set the lookup reference link(s) to "entire" partitioning and see if a multinode configuration works.

ArndW · Post by **ArndW** » Wed Aug 22, 2007 4:06 pm

Lokesh - 'auto' DOES do partitioning. What happened when you made the reference links 'entire'?

lokesh_chopade · Post by **lokesh_chopade** » Wed Aug 22, 2007 11:08 pm

Thanks Arnd, Its working fine.

But in general, as you said 'AUTO' also do partition, so if used only auto will the result always consistent for multiple nodes?

ArndW · Post by **ArndW** » Wed Aug 22, 2007 11:33 pm

lokesh - how often does this need repeating? Using "auto" partitioning will not guarantee that a job will work correctly!

Let's take a very simple example. The source file has 2 columns, a numeric sequential row number and a numeric Employee number. Let us assume the default "auto" partitioning is done using a round robin algorithm on the first column and that you have a 2 node configuration.

Assuming source data

Code: Select all

Key,Emp
001,001
002,002
003,003
004,004

This means rows 1 and 3 go into node 0 and 2 & 4 and so on go to node 1.

Now you put in a lookup stage looking up your Employee table, which has the same key as column 2 of the source file and contains the employee name in another column. You also set this to AUTO and we'll just use round robin partitioning as well.

So now node 0 on the lookup gets values 1,3 and node 1 gets 2,4.

Now take the first row, node 0 looks up employee 002 from the lookup node 0 and gets no match. The next row on node 0 tries to find employee key 003 and also gets no match. The same procedure now happens twice on source node 1 and both lookups fail there as well.

In this example you will get 4 failed lookups using "AUTO"; if you went to a 1-node configuration you would get the program to work correctly, but only because you are masking the real programming error. This is why I always recommend designing on more than 1 node configuration, since if it work in one multinode configuration it will work in any configuration.

The correct solution in this case is to make the lookup file link "ENTIRE" so that each node gets a full copy of all the rows; or to partition the source and the lookup on Employee Key.

DSXchange

Partitions and Nodes configuration problem

Re: Partitions and Nodes configuration problem