Page 1 of 1

dataset issue

Posted: Thu Sep 20, 2012 12:00 am
by cuckoo
Hi All,

I have designed two jobs. First job has some processing logic and the out put is written to a dataset. this dataset created is used as reference in my second jbb. the second job flow is : db2 --> join with data from dataset -->output to a file.

The problem i am facing is that, when i create the dataset there are 2 lakh records. . but in my second job when i read same ds, it reads only 99,000 records. there are no warnings, no errors in both the jobs.

but if i change my second job design to dataset-->transformer --> then join with db2.. all records are being read.

why is the dataset when used with join reading less no of rows?

Thank you very much in advance.

Posted: Thu Sep 20, 2012 1:24 am
by ArndW
This looks very much like an issue in your join. When you monitor your job I would wager that you are getting all of your dataset records going to the join. Is the dataset source the "right" link into an inner join?

Posted: Thu Sep 20, 2012 2:07 am
by cuckoo
hi ArndW,

Thank you for your suggestion.
But i am opting for a left outer join and this dataset is the reference dataset[right link]. The left link corresponds to the data incoming from a table.

Posted: Thu Sep 20, 2012 2:10 am
by ArndW
Wouldn't a left outer join mean that for each row coming in on the LEFT it would find a match on the RIGHT and output a row with the matching data or nulls; i.e. the number of ouput rows will be identical to the number of input rows from the LEFT link?

Posted: Thu Sep 20, 2012 2:52 am
by jerome_rajan
A few points to troubleshoot

1. Ensure RCP isn't causing clashes.
2. If your dataset was created with a non-keyed partitioning method, ensure that you don't preserve partitioning while performing the join. Go ahead and do a hash partition and sort on the join keys.
3. Try to verify if the issue persists with a lookup instead of a join stage.

Posted: Thu Sep 20, 2012 4:52 am
by BI-RMA
Left outer join stops reading data from the right link when the last row from the left link was read. Remember that data on both input-links to the join needs to be sorted by the join-key and is read synchronously.

If DataStage identifies that your right link contains no data matching any row from your left link it will stop reading the right link after the first row.