APT_DUMP_SCORE?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

APT_DUMP_SCORE?

Post by Raamc »

Hi,

My Job has 3 input datasets, one join stage, 1 transformer stage and a sequential file. I added the Environment variable APT_DUMP_SCORE in my job. I got below information in the log
......
ds9: {op6[2p] (parallel buffer(0))
[pp] eSame=>eCollectAny
op9[2p] (parallel APT_JoinSubOperator(0) in jnDli56Data)}
ds10: {op7[2p] (parallel buffer(1))
[pp] eSame=>eCollectAny
op9[2p] (parallel APT_JoinSubOperator(0) in jnDli56Data)}
ds11: {op8[2p] (parallel buffer(2))
[pp] eSame=>eCollectAny
op11[2p] (parallel APT_JoinSubOperator(1) in jnDli56Data)}
ds12: {op9[2p] (parallel APT_JoinSubOperator(0) in jnDli56Data)
[pp] eSame=>eCollectAny
op10[2p] (parallel buffer(3))}
ds13: {op10[2p] (parallel buffer(3))
[pp] eSame=>eCollectAny
op11[2p] (parallel APT_JoinSubOperator(1) in jnDli56Data)}
ds14: {op11[2p] (parallel APT_JoinSubOperator(1) in jnDli56Data)
[pp] eSame=>eCollectAny
op12[2p] (parallel APT_CombinedOperatorController(3):stJnKeyCngCol)}
.........

From the above log my doubt is,why many datasets has been assigned to a single join stage (jnDli56Data)?

Generally each stage will be assigned with single datastage but this join has been assigned to more datasets why?

Could any one please explain?

Thnaks
Thanks,
Raamc
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Do you have a 4-node configuration file?
dsusr
Premium Member
Premium Member
Posts: 104
Joined: Sat Sep 03, 2005 11:30 pm

Post by dsusr »

It seems you are running th job on 2 node configuration and since you are having 3 input files to join so it is resulting in 6 entries.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's not how it works, folks. Irrespective of the degree of parallelism you only get one data set between each pair of operators. Remember that the Score is distributed separately to the section leader process on each node.

What you need to look at here is the fact that DataStage has inserted some buffer operators (to account for differences in processing speeds on the input links to the Join stage), and has generated two Join sub-operators in a composite - which is indicated by the "in" keyword.

The Operators section of the Score will make this clearer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply