I have a parallel job wherein I am expected to check whether a record in source data contains valid data.
The format of the record is as follows:
ACCESSID BIGINT 18 NOT NULL
USERID VARCHAR 6 NOT NULL
PURPOSEID SMALLINT 5 NOT NULL
CLIENTID INTEGER 10 NOT NULL
MASTERID INTEGER 10 NOT NULL
Now, I have to check whether the userid, purposeid, clientid and masterid stated in this source data is present in lookup tables, and have used join stage (left outer join mode) to check the same.
Initially, to check whether the job worked for valid data combinations, I populated the source table with values from the lookup tables themselves. However, despite doing this, the output shows that there is no match, and all records in source are rejected, when the expected result is that all be accepted as valid .
Could anyone tell me what may possibly be the problem?
Join stage not recognising match
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
Join stage not recognising match
Regards,
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard. :D
Before Join stage will work properly both its inputs much be identically partitioned and sorted on the join key(s). Is this the case in your job?
Before Join stage will work properly both its inputs much be identically partitioned and sorted on the join key(s). Is this the case in your job?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
Hi.ray.wurlod wrote:Welcome aboard. :D
Before Join stage will work properly both its inputs much be identically partitioned and sorted on the join key(s). Is this the case in your job? ...
The inputs are sorted on the join key(s), but the problem still persists.
Regards,
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 43
- Joined: Mon Jan 15, 2007 10:53 pm
Thanks for your assistance.ray.wurlod wrote:What about partitioned on the join key(s)? ...
Partitioning on the join keys worked. But I am not able to understand as to why it is necessary or rather why my not having partitioned the input data set(s) on the join key(s) was causing a problem. Could you please explain the reason(s) for this problem?
Regards,
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Vivek D. Reddy
__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
Non partitioned or Auto partitioned data will not assure you that identical key data will flow in the same partition.
Say if you have 4 partitions in your system (as decided by you config file), the data will be split in to 4 streams in each active stage. So Join stage will have 4 streams in left and right input. The data will be joined only if the identical key flows in corresponding partition. Thats the reason, Entire partition was recomended for Lookup, to get a match for the key irrespective of the partition.
We have discussed this a lot many times. If you do search, you can find humpty no of posts.
Say if you have 4 partitions in your system (as decided by you config file), the data will be split in to 4 streams in each active stage. So Join stage will have 4 streams in left and right input. The data will be joined only if the identical key flows in corresponding partition. Thats the reason, Entire partition was recomended for Lookup, to get a match for the key irrespective of the partition.
We have discussed this a lot many times. If you do search, you can find humpty no of posts.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'