Join stage output quantity mismatch query

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Join stage output quantity mismatch query

Post by vivekreddy »

I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.

I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.

Any suggestions on what I should do?
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Re: Join stage output quantity mismatch query

Post by vijayrc »

vivekreddy wrote:I have a job wherein I am joining two datasets using a join stage and join type as left outer.
I have sorted and partitioned both datasets on the join key.
However, still I am not getting the desired output.

I have around 6670 rows as input to the join stage, but as output, I get only 2099. I thought that 6670 remains the minimum possible when performing left-outer join.

Any suggestions on what I should do?
Though you have mentioned LEFT OUTER, make sure the files are Link Ordered properly [Left and Right]
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It may also be possible that your data are not partitioned based upon the join key (in addition to any other potential cause). Check that also.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

All done, still not working :oops:
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

What type of partition is used?
Just be aware that, 6670 will be output, only if its a Outer Join on the dataset which has 6670 number of records.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

Entire
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Entire wont be a prescribed partition for Join stage. But still this will increase the number of resultant rows and not decrease.
Now explain more on what are the keys, and what is the partition that used on which stage and especially on the join stage, for both the input.
Basically need more details on job design.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

The key is a character field of length 2. In one dataset, partitioning method is Auto, whereas from the other, the left link, it is entire.
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
vivekreddy
Participant
Posts: 43
Joined: Mon Jan 15, 2007 10:53 pm

Post by vivekreddy »

The partitioning method on the Join stage is Entire.
Regards,
Vivek D. Reddy

__________________________________________
If knowledge can create problems, it is not through ignorance that we can solve them. - Isaac Asimov
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I would suggest, do a hash partition on the Key, well before join, i.e,. the stages where you sort the data. And use same partition till Join stage.
Check if by any chance, you have any unique sort option enabled, and it removes duplicates.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply