Hi All,
I have designed two jobs. First job has some processing logic and the out put is written to a dataset. this dataset created is used as reference in my second jbb. the second job flow is : db2 --> join with data from dataset -->output to a file.
The problem i am facing is that, when i create the dataset there are 2 lakh records. . but in my second job when i read same ds, it reads only 99,000 records. there are no warnings, no errors in both the jobs.
but if i change my second job design to dataset-->transformer --> then join with db2.. all records are being read.
why is the dataset when used with join reading less no of rows?
Thank you very much in advance.
dataset issue
Moderators: chulett, rschirm, roy
dataset issue
thank you
This looks very much like an issue in your join. When you monitor your job I would wager that you are getting all of your dataset records going to the join. Is the dataset source the "right" link into an inner join?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Wouldn't a left outer join mean that for each row coming in on the LEFT it would find a match on the RIGHT and output a row with the matching data or nulls; i.e. the number of ouput rows will be identical to the number of input rows from the LEFT link?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
A few points to troubleshoot
1. Ensure RCP isn't causing clashes.
2. If your dataset was created with a non-keyed partitioning method, ensure that you don't preserve partitioning while performing the join. Go ahead and do a hash partition and sort on the join keys.
3. Try to verify if the issue persists with a lookup instead of a join stage.
1. Ensure RCP isn't causing clashes.
2. If your dataset was created with a non-keyed partitioning method, ensure that you don't preserve partitioning while performing the join. Go ahead and do a hash partition and sort on the join keys.
3. Try to verify if the issue persists with a lookup instead of a join stage.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Left outer join stops reading data from the right link when the last row from the left link was read. Remember that data on both input-links to the join needs to be sorted by the join-key and is read synchronously.
If DataStage identifies that your right link contains no data matching any row from your left link it will stop reading the right link after the first row.
If DataStage identifies that your right link contains no data matching any row from your left link it will stop reading the right link after the first row.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon