Join Not Working on multiple nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
this_is_venki
Participant
Posts: 23
Joined: Fri Nov 04, 2005 8:34 am

Join Not Working on multiple nodes

Post by this_is_venki »

Hi

In one of the parallel jobs I had been using the config file as single.apt(1 node).
The job uses a Oracle reads,Transfomers,JOIN stage and finally Oracle write.
This job has been tested on the testing Environment.

Now for Performance reasons I just tried changing the config File from single.apt to medium.apt(4 nodes)

This actually lead to some incorrect data in the final table.
When i investigated for a specific record, i was able to spot that the join was not working.

Again when i ran with single.apt (For that specific record) It ran fine.

This is really baffling....
Does or not, Join Work with multiple nodes....
Someone please suggest something

TIA,
Venky
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a requirement for the Join and Merge stages that the input Data Sets be identically partitioned and sorted on all the join keys. Have you configured this? A Lookup stage has the same requirement or, if you do not want to sort the primary input Data Set, that the reference inputs use Entire partitioning.

Yes, they DO work in parallel execution environment. It would be a strange thing to call it a parallel execution environment if they did not!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Try to sort and do a hash partion based on Joing key prior to join or merge for the both the input. So that the joining key will be similarly distributed across nodes for both set of data.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

yeah..a simple hash partition should solve this problem...u may lose records with out specifying correct partition options..we had the same problem 2 years ago when we tested the software in test env with single node and had to fix this issue later in uat env whioch is multi node
Post Reply