Sort stage outputs Fewer Records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satishbalantrapu
Participant
Posts: 12
Joined: Wed Jul 25, 2007 6:56 am
Location: Hyd

Sort stage outputs Fewer Records

Post by satishbalantrapu »

Hi,
We are using a federated join between 2 DB2 systems. and joining the output. Before joining we are using explicit sort stages to sort the data.

Following is the job design (RCP is used)

1DB2 Stage ------------ 1Sort Stage
######################## ----- Join Stage --Dataset
2DB2 Stage ------------ 2Sort Stage


Following are the record counts
Output 1DB2 Stage and Output 1 Sort Stage - 5000
Output 2DB2 stage is 5000 BUT Output 2 Sort Stage is 2500 only. (This is the problem. This should have given out 5000 records)

The job ran fine without any warnings. But we could find the following informational message (not warning) in the log

[code]/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 0 600 edwdev edwdev-sw1 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 1 600 edwdev-2 edwdev-sw1a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 2 600 edwdev-3 edwdev-sw2 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 3 600 edwdev-4 edwdev-sw2a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 4 600 edwdev-5 edwdev-sw1 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 5 600 edwdev-6 edwdev-sw1a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 6 600 edwdev-7 edwdev-sw2 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 7 600 edwdev-8 edwdev-sw2a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 8 600 edwddb1 edwddb1-sw1a 1214385257.105183.607018[/code]

Some observations:
I made sure that all the partitioning keys are right.
We have similar jobs and they are giving the same issue at the 2Sort Stage.
The other jobs which do not have federated join (and have 2 sort stages followed by join stage) run fine.
Also the user running the job has access to all the nodes.

Please help me identify whats going wrong.

Thanks
Satish Balantarapu
poornima.chandramouli
Participant
Posts: 42
Joined: Wed Dec 26, 2007 1:58 am
Location: India

Re: Sort stage outputs Fewer Records

Post by poornima.chandramouli »

Hi Satish,

what kind of partitioning ur applying?
Instead of aplying a explicit sort stage,why don't u try sorting in the partition.
How you are telling that the second sort is gving only 2500 records,if u r seeing through the perormance statistics,then u hv to give more info of what kind of join u r doing in the join stage..........


Thanks

satishbalantrapu wrote:Hi,
We are using a federated join between 2 DB2 systems. and joining the output. Before joining we are using explicit sort stages to sort the data.

Following is the job design (RCP is used)

1DB2 Stage ------------ 1Sort Stage
######################## ----- Join Stage --Dataset
2DB2 Stage ------------ 2Sort Stage


Following are the record counts
Output 1DB2 Stage and Output 1 Sort Stage - 5000
Output 2DB2 stage is 5000 BUT Output 2 Sort Stage is 2500 only. (This is the problem. This should have given out 5000 records)

The job ran fine without any warnings. But we could find the following informational message (not warning) in the log

Code: Select all

/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 0 600 edwdev edwdev-sw1 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 1 600 edwdev-2 edwdev-sw1a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 2 600 edwdev-3 edwdev-sw2 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 3 600 edwdev-4 edwdev-sw2a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 4 600 edwdev-5 edwdev-sw1 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 5 600 edwdev-6 edwdev-sw1a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 6 600 edwdev-7 edwdev-sw2 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 7 600 edwdev-8 edwdev-sw2a 1214385257.105183.607018
/u001/Ascential/DataStage/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /u001/Ascential/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag edwdev 10003 8 600 edwddb1 edwddb1-sw1a 1214385257.105183.607018
Some observations:
I made sure that all the partitioning keys are right.
We have similar jobs and they are giving the same issue at the 2Sort Stage.
The other jobs which do not have federated join (and have 2 sort stages followed by join stage) run fine.
Also the user running the job has access to all the nodes.

Please help me identify whats going wrong.

Thanks
Satish Balantarapu
Regards
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

Maybe you have duplicate records and have turn on the option to remove dupes?
satishbalantrapu
Participant
Posts: 12
Joined: Wed Jul 25, 2007 6:56 am
Location: Hyd

Re: Sort stage outputs Fewer Records

Post by satishbalantrapu »

Hi Poornima,
I believe you meant inline sorting in join stage. I used explicit sort stage because we had existing jobs. I just wanted to know why there is a difference in the record counts.
Since an inline sort is equivalent to explicit sort, I thought this should work.
Also I am using an inner join.

Partitioning:
Hash partitioning in both sort stages
Same partitioning in both input links of join stage. (Tried Auto too )
satishbalantrapu
Participant
Posts: 12
Joined: Wed Jul 25, 2007 6:56 am
Location: Hyd

Post by satishbalantrapu »

Hi John,
I dont have duplicates and am not removing duplicates.
I checked the query results in DB2 client and there are no duplicates.
sunayan_pal
Participant
Posts: 49
Joined: Fri May 11, 2007 12:24 am
Location: kolkata

Post by sunayan_pal »

hi,

As per the job design i would suggest you why do not you perform the join in the DataBase it will reduce the time and the overhead of the system.

get the output of the join from the database into the required dataset


in the DB2 stage
do the following

select a.clo1, a.col2 , b.col1,b.col3
from table1 a , table2 b
where a.col1=b.col1

now get the data into the Dataset.
regards
sunayan
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

what is the record count in dataset?
which link is master?
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

keshav0307 -> It's been mentionned that the join was 'Inner', so no master link.

To check if it's a partitioning issue, use a single node config file to run the job.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

sometime the monitor behave crazy.
probably the link is showing fewer records because of the join stage.

can you create another job with just the
2DB2 Stage ------------ 2Sort Stage ----------Sequential file Stage

to check if the sort stage is still giving fewer records.
satishbalantrapu
Participant
Posts: 12
Joined: Wed Jul 25, 2007 6:56 am
Location: Hyd

Post by satishbalantrapu »

keshav0307 wrote:sometime the monitor behave crazy.
probably the link is showing fewer records because of the join stage.

can you create another job with just the
2DB2 Stage ------------ 2Sort Stage ----------Sequential file Stage

to check if the sort stage is still giving fewer records.
Keshav,
You are right. The monitor did not show me the correct results. I got the actual record counts using the env variable APT_RECORD_COUNTS. So my problem about the record count is solved.
Moral: Do not completely rely the on job performance statistics

Now I have a problem with the join stage. Even though there are no duplicates at the source level (both the sources), join stage's output has duplicates :roll: Any ideas

Thanks,
Satish
satishbalantrapu
Participant
Posts: 12
Joined: Wed Jul 25, 2007 6:56 am
Location: Hyd

Post by satishbalantrapu »

sunayan_pal wrote:hi,

As per the job design i would suggest you why do not you perform the join in the DataBase it will reduce the time and the overhead of the system.

get the output of the join from the database into the required dataset


in the DB2 stage
do the following

select a.clo1, a.col2 , b.col1,b.col3
from table1 a , table2 b
where a.col1=b.col1

now get the data into the Dataset.

Hi sunayan
Thanks. That is my last resort. (Because I am expected to develop jobs similar to existing ones.)

Thanks,
Satish
Post Reply