dataset issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
cuckoo
Participant
Posts: 26
Joined: Mon Mar 22, 2010 4:03 am
Location: bangalore

dataset issue

Post by cuckoo »

Hi All,

I have designed two jobs. First job has some processing logic and the out put is written to a dataset. this dataset created is used as reference in my second jbb. the second job flow is : db2 --> join with data from dataset -->output to a file.

The problem i am facing is that, when i create the dataset there are 2 lakh records. . but in my second job when i read same ds, it reads only 99,000 records. there are no warnings, no errors in both the jobs.

but if i change my second job design to dataset-->transformer --> then join with db2.. all records are being read.

why is the dataset when used with join reading less no of rows?

Thank you very much in advance.
thank you
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This looks very much like an issue in your join. When you monitor your job I would wager that you are getting all of your dataset records going to the join. Is the dataset source the "right" link into an inner join?
cuckoo
Participant
Posts: 26
Joined: Mon Mar 22, 2010 4:03 am
Location: bangalore

Post by cuckoo »

hi ArndW,

Thank you for your suggestion.
But i am opting for a left outer join and this dataset is the reference dataset[right link]. The left link corresponds to the data incoming from a table.
thank you
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Wouldn't a left outer join mean that for each row coming in on the LEFT it would find a match on the RIGHT and output a row with the matching data or nulls; i.e. the number of ouput rows will be identical to the number of input rows from the LEFT link?
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

A few points to troubleshoot

1. Ensure RCP isn't causing clashes.
2. If your dataset was created with a non-keyed partitioning method, ensure that you don't preserve partitioning while performing the join. Go ahead and do a hash partition and sort on the join keys.
3. Try to verify if the issue persists with a lookup instead of a join stage.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Left outer join stops reading data from the right link when the last row from the left link was read. Remember that data on both input-links to the join needs to be sorted by the join-key and is read synchronously.

If DataStage identifies that your right link contains no data matching any row from your left link it will stop reading the right link after the first row.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply