Join stage not recognising match

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Join stage not recognising match

Post by vivekreddy »

I have a parallel job wherein I am expected to check whether a record in source data contains valid data.
The format of the record is as follows:
ACCESSID BIGINT 18 NOT NULL
USERID VARCHAR 6 NOT NULL
PURPOSEID SMALLINT 5 NOT NULL
CLIENTID INTEGER 10 NOT NULL
MASTERID INTEGER 10 NOT NULL

Now, I have to check whether the userid, purposeid, clientid and masterid stated in this source data is present in lookup tables, and have used join stage (left outer join mode) to check the same.
Initially, to check whether the job worked for valid data combinations, I populated the source table with values from the lookup tables themselves. However, despite doing this, the output shows that there is no match, and all records in source are rejected, when the expected result is that all be accepted as valid :shock: .

Could anyone tell me what may possibly be the problem? :?:
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

Before Join stage will work properly both its inputs much be identically partitioned and sorted on the join key(s). Is this the case in your job?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

ray.wurlod wrote:Welcome aboard. :D

Before Join stage will work properly both its inputs much be identically partitioned and sorted on the join key(s). Is this the case in your job? ...
Hi.

The inputs are sorted on the join key(s), but the problem still persists.
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What about partitioned on the join key(s)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

ray.wurlod wrote:What about partitioned on the join key(s)? ...
Thanks for your assistance.
Partitioning on the join keys worked. But I am not able to understand as to why it is necessary or rather why my not having partitioned the input data set(s) on the join key(s) was causing a problem. Could you please explain the reason(s) for this problem?
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Non partitioned or Auto partitioned data will not assure you that identical key data will flow in the same partition.
Say if you have 4 partitions in your system (as decided by you config file), the data will be split in to 4 streams in each active stage. So Join stage will have 4 streams in left and right input. The data will be joined only if the identical key flows in corresponding partition. Thats the reason, Entire partition was recomended for Lookup, to get a match for the key irrespective of the partition.
We have discussed this a lot many times. If you do search, you can find humpty no of posts.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply