Page 1 of 1

Missing Records when using Join Stage with Datasets

Posted: Thu Dec 15, 2011 10:50 am
by mgray412
I have a Dataset which contains approximately 2.0 million records. When I connect the dataset to a Join stage only 1.3 million of the records are being read. I can put a Transformer or Copy stage between the Dataset and the Join and all records from the Dataset will be read. Any ideas on why connection directly to a Join would cause records to be omitted? I am running version 8.5.

Thanks

Posted: Thu Dec 15, 2011 2:45 pm
by ray.wurlod
How have you confirmed the record count in the Data Set?

Posted: Thu Dec 15, 2011 3:14 pm
by Mike
If you are doing an inner join...

Once the stage has read all records from one input link, there is no need to continue reading from any other input links.

Mike

Re: Missing Records when using Join Stage with Datasets

Posted: Thu Dec 15, 2011 8:00 pm
by kwwilliams
mgray412 wrote: Any ideas on why connection directly to a Join would cause records to be omitted
Data in one link and not in the other.
Improper sorting
Improper partitioning

Posted: Fri Dec 16, 2011 12:23 am
by Kirtikumar
Did you check the DS record count? Try using the $ORCHADMIN ll and check it.

Also how was the DS created - using a job or $ORCHADMIN cp. Once I had used $ORCHADMIN cp and the newly created DS has a similar issue i.e. records read were less without any warning. There was some issue with char encoding while writing to new. I recreated the DS and it worked.

Re: Missing Records when using Join Stage with Datasets

Posted: Fri Dec 16, 2011 7:02 am
by mgray412
Thanks for your responses. Here are the answers to some of the questions that were asked:

1) Confirmed dataset record count using the Data Set Management tool in Designer
2) Using a Left Outer Join
3) Dataset was created using a DataStage job

Thanks

Posted: Fri Dec 16, 2011 7:10 am
by Mike
Which now leaves you with Keith's suggested last two possibilities... improper partitioning or improper sorting.

Mke

Posted: Fri Dec 16, 2011 7:46 am
by Mike
On second thought... a left join will read all of the data from the left input. Improper partitioning or improper sorting will just lead to bad results. Double check the link order to make sure that the left input is really the one that you think it is.

Mike

Posted: Fri Dec 16, 2011 7:59 am
by mgray412
Mike,

The left link is to the source file and the right is to the dataset.

Thanks

Posted: Fri Dec 16, 2011 8:04 am
by Mike
OK.

So there you have it. Once all of the data has been read from the left link of a left join (source file), there is no reason to keep reading from the right link (dataset).

Mike

Posted: Fri Dec 16, 2011 8:13 am
by mgray412
Oh Ok. That makes sense now!! Thank You for your help!!!