Few doubts (PS not interview questions)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Few doubts (PS not interview questions)

Post by tostay2003 »

Hi,

I am trying to learn parallel extender from documents (no software installed).

Q1) I didnt find any difference between DIFF stage and CDC Stage. Functionally both are 100% same. Did I miss anything from documentation?
Q2) I tried to read about scores and situations that could help in identifying redundant operators/sort/repartitioning inserted by datastage.
But couldn't visualize the occurance of these three scenarios. If anyone has come across the scenario and solved it by modifying the job. Could you please let me know when/how that was done.
Q3) Any situation where the use basic transformer became obligatory while designing parallel extender?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A1) Look more carefully at the output structure of each.
A2) Please advise in which manual you found the scenarios.
A3) No.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

ray.wurlod wrote:A1) Look more carefully at the output structure of each.
A2) Please advise in which manual you found the scenarios.
A3) No. ...
A1) I see the difference in DiffCode() and ChangeCode(). But couldn't find the difference in functionality
A2)
I found the below in Advanced Developer's Guide
The score dump is particularly useful in showing you where DataStage is
inserting additional components in the job flow. In particular DataStage
will add partition and sort operators where the logic of the job demands it.
Sorts in particular can be detrimental to performance and a score dump
can help you to detect superfluous operators and amend the job design to
remove them.
Does anyone have any real example of score dump, where such scenarios occured (superfluous operators, detrimental sorts, added partitioning) and was improved in performance by changing the job. If possible the score dump after making the amendments to the job for performance. I am having hard time trying to visualize with just textual information.

A3) Sorry if I misquoted the question - "Any tasks which parallel transformer couldn't perform as Basic/Server Transformer". I guess the answer is still NO. Just rephrasing my question properly.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

3) Parallel transformer cannot call server routines.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A3) The big one - parallel Transformer can executed on any node in a clustered/grid configuration, whether or not the DataStage server engine is visible. You don't need any others.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

Thanks for the responses. Can you please provide with an example (score dump) before and after (performance tuning) scoredump
tostay2003 wrote:....
I found the below in Advanced Developer's Guide
The score dump is particularly useful in showing you where DataStage is
inserting additional components in the job flow. In particular DataStage
will add partition and sort operators where the logic of the job demands it.
Sorts in particular can be detrimental to performance and a score dump
can help you to detect superfluous operators and amend the job design to
remove them.
Does anyone have any real example of score dump, where such scenarios occured (superfluous operators, detrimental sorts, added partitioning) and was improved in performance by changing the job. If possible the score dump after making the amendments to the job for performance. I am having hard time trying to visualize with just textual information.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An example without a score dump (these tend to be governed by non-disclosure agreements signed with clients). DataStage inserted tsort operators on both inputs to Join stage even though the data were already sorted by the SQL query. So the data were re-sorted unnecessarily (and it was a large number of rows). Using explicit Sort stages, with sort mode set to "don't sort (previously sorted)" obviated execution of the unnecessary sort. If I recall correctly, the overall elapsed time was reduced by approximately 15%.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

Thanks for the example of sort.

could you let me know scenario when detrimental added partitioning is introduced by datastage and how to overcome it.

(verbal example as earlier would be sufficient)
jcthornton
Premium Member
Premium Member
Posts: 79
Joined: Thu Mar 22, 2007 4:58 pm
Location: USA

Post by jcthornton »

As indicated above, when using Auto Partitioning, the default behavior of DS is to put in Sort and Repartition on inputs anytime it is set to 'Auto' partitioning and it thinks that the inbound data is does not match the stage requirement.

There are two ways to overcome this.
1. Turn off auto partitioning and auto sorting using the environment variables
The caveat here is you have to know what you are doing with partitioning and explicitly set partitioning everywhere you want it. 'Auto' partitioning will not do anything. The advantage of this over the other solution is that combinability of stages is preserved.

2. Explicitly set the partitioning using 'Same'.
Post Reply