dataset issue

cuckoo · Post by **cuckoo** » Thu Sep 20, 2012 12:00 am

Hi All,

I have designed two jobs. First job has some processing logic and the out put is written to a dataset. this dataset created is used as reference in my second jbb. the second job flow is : db2 --> join with data from dataset -->output to a file.

The problem i am facing is that, when i create the dataset there are 2 lakh records. . but in my second job when i read same ds, it reads only 99,000 records. there are no warnings, no errors in both the jobs.

but if i change my second job design to dataset-->transformer --> then join with db2.. all records are being read.

why is the dataset when used with join reading less no of rows?

Thank you very much in advance.

ArndW · Post by **ArndW** » Thu Sep 20, 2012 1:24 am

This looks very much like an issue in your join. When you monitor your job I would wager that you are getting all of your dataset records going to the join. Is the dataset source the "right" link into an inner join?

cuckoo · Post by **cuckoo** » Thu Sep 20, 2012 2:07 am

hi ArndW,

Thank you for your suggestion.
But i am opting for a left outer join and this dataset is the reference dataset[right link]. The left link corresponds to the data incoming from a table.

ArndW · Post by **ArndW** » Thu Sep 20, 2012 2:10 am

Wouldn't a left outer join mean that for each row coming in on the LEFT it would find a match on the RIGHT and output a row with the matching data or nulls; i.e. the number of ouput rows will be identical to the number of input rows from the LEFT link?

jerome_rajan · Post by **jerome_rajan** » Thu Sep 20, 2012 2:52 am

A few points to troubleshoot

1. Ensure RCP isn't causing clashes.
2. If your dataset was created with a non-keyed partitioning method, ensure that you don't preserve partitioning while performing the join. Go ahead and do a hash partition and sort on the join keys.
3. Try to verify if the issue persists with a lookup instead of a join stage.

BI-RMA · Post by **BI-RMA** » Thu Sep 20, 2012 4:52 am

Left outer join stops reading data from the right link when the last row from the left link was read. Remember that data on both input-links to the join needs to be sorted by the join-key and is read synchronously.

If DataStage identifies that your right link contains no data matching any row from your left link it will stop reading the right link after the first row.