Missing Records when using Join Stage with Datasets

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mgray412
Participant
Posts: 12
Joined: Tue Jan 18, 2011 10:18 am

Missing Records when using Join Stage with Datasets

Post by mgray412 »

I have a Dataset which contains approximately 2.0 million records. When I connect the dataset to a Join stage only 1.3 million of the records are being read. I can put a Transformer or Copy stage between the Dataset and the Join and all records from the Dataset will be read. Any ideas on why connection directly to a Join would cause records to be omitted? I am running version 8.5.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How have you confirmed the record count in the Data Set?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

If you are doing an inner join...

Once the stage has read all records from one input link, there is no need to continue reading from any other input links.

Mike
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Missing Records when using Join Stage with Datasets

Post by kwwilliams »

mgray412 wrote: Any ideas on why connection directly to a Join would cause records to be omitted
Data in one link and not in the other.
Improper sorting
Improper partitioning
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

Did you check the DS record count? Try using the $ORCHADMIN ll and check it.

Also how was the DS created - using a job or $ORCHADMIN cp. Once I had used $ORCHADMIN cp and the newly created DS has a similar issue i.e. records read were less without any warning. There was some issue with char encoding while writing to new. I recreated the DS and it worked.
Regards,
S. Kirtikumar.
mgray412
Participant
Posts: 12
Joined: Tue Jan 18, 2011 10:18 am

Re: Missing Records when using Join Stage with Datasets

Post by mgray412 »

Thanks for your responses. Here are the answers to some of the questions that were asked:

1) Confirmed dataset record count using the Data Set Management tool in Designer
2) Using a Left Outer Join
3) Dataset was created using a DataStage job

Thanks
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Which now leaves you with Keith's suggested last two possibilities... improper partitioning or improper sorting.

Mke
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

On second thought... a left join will read all of the data from the left input. Improper partitioning or improper sorting will just lead to bad results. Double check the link order to make sure that the left input is really the one that you think it is.

Mike
mgray412
Participant
Posts: 12
Joined: Tue Jan 18, 2011 10:18 am

Post by mgray412 »

Mike,

The left link is to the source file and the right is to the dataset.

Thanks
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

OK.

So there you have it. Once all of the data has been read from the left link of a left join (source file), there is no reason to keep reading from the right link (dataset).

Mike
mgray412
Participant
Posts: 12
Joined: Tue Jan 18, 2011 10:18 am

Post by mgray412 »

Oh Ok. That makes sense now!! Thank You for your help!!!
Post Reply