Query on score dump

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Query on score dump

Post by LD »

Hi,

I was trying to understand the score dump of my PX job. My query is related to buffering. I read in Advanced guide that score dump provides information on where data is buffered.
But I could not understand the meaning of an datasets and operators for buffering in the score dump i.e.

Data Set example:
ds18: {op18[4p] (parallel APT_TransformOperatorImplV0S3_PatientAccountStdFileStgPX_File_Tra_DischargeDate in Tra_DischargeDate)
eAny=>eCollectAny
op20[4p] (parallel buffer(0))}

Another example,

ds29: {op20[4p] (parallel buffer(0))
eSame=>eCollectAny
op21[4p] (parallel APT_LUTProcessOp in Lookup_EffDate)}


Operator Example:
op19[1p] {(parallel APT_LUTCreateOp in Lookup_EffDate)
on nodes (
node1[op19,p0]
)}
op20[4p] {(parallel buffer(0))
on nodes (
node1[op20,p0]
node2[op20,p1]
node3[op20,p2]
node4[op20,p3]
)}


op23[4p] {(parallel APT_TransformOperatorImplV0S22_PatientAccountStdFileStgPX_File_Tra_GetEffDate in Tra_GetEffDate)
on nodes (
node1[op23,p0]
node2[op23,p1]
node3[op23,p2]
node4[op23,p3]
)}
op24[4p] {(parallel buffer(1))
on nodes (
node1[op24,p0]
node2[op24,p1]
node3[op24,p2]
node4[op24,p3]
)}


Queries:

1) What does these data sets and operator for buffer means
2) What impact they have on performance
3) Do these operator execute according to the serial number assigned to them

Thanks,

Shashank
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Data Sets, if the descriptor file name ends in ".v", are virtual Data Sets, corresponding with links in the job. Buffer operators are inserted by the Orchestrate framework to handle conditions in which flows from multiple inputs are likely to be arriving at different rates and which, without the buffer operators, would likely cause a deadlock situation.

The "serial numbers" as you call them exist purely to provide for unique generic names. They do not have any effect on the order of execution. Execution is parallel on as many nodes as the operator executes on, and data are spread over those nodes in accordance with the partitioning information given in the Data Sets section of the score.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Hi Ray,

Thanks a lot. With careful examination of job score I'm able to relate what said with the actual score.

By sequence I meant, any particular record in a given partition will be passed from operator to operator in the given sequence only. But that is obvious because we put stages in the same order.

Thanks,

Shashank
Post Reply