Few doubts (PS not interview questions)

tostay2003 · Post by **tostay2003** » Wed Jun 17, 2009 12:35 pm

Hi,

I am trying to learn parallel extender from documents (no software installed).

Q1) I didnt find any difference between DIFF stage and CDC Stage. Functionally both are 100% same. Did I miss anything from documentation?
Q2) I tried to read about scores and situations that could help in identifying redundant operators/sort/repartitioning inserted by datastage.
But couldn't visualize the occurance of these three scenarios. If anyone has come across the scenario and solved it by modifying the job. Could you please let me know when/how that was done.
Q3) Any situation where the use basic transformer became obligatory while designing parallel extender?

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Jun 17, 2009 4:24 pm

A1) Look more carefully at the output structure of each.
A2) Please advise in which manual you found the scenarios.
A3) No.

tostay2003 · Post by **tostay2003** » Wed Jun 17, 2009 10:06 pm

ray.wurlod wrote:A1) Look more carefully at the output structure of each.
A2) Please advise in which manual you found the scenarios.
A3) No. ...

A1) I see the difference in DiffCode() and ChangeCode(). But couldn't find the difference in functionality
A2)
I found the below in Advanced Developer's Guide

The score dump is particularly useful in showing you where DataStage is
inserting additional components in the job flow. In particular DataStage
will add partition and sort operators where the logic of the job demands it.
Sorts in particular can be detrimental to performance and a score dump
can help you to detect superfluous operators and amend the job design to
remove them.

Does anyone have any real example of score dump, where such scenarios occured (superfluous operators, detrimental sorts, added partitioning) and was improved in performance by changing the job. If possible the score dump after making the amendments to the job for performance. I am having hard time trying to visualize with just textual information.

A3) Sorry if I misquoted the question - "Any tasks which parallel transformer couldn't perform as Basic/Server Transformer". I guess the answer is still NO. Just rephrasing my question properly.

balajisr · Post by **balajisr** » Wed Jun 17, 2009 10:16 pm

3) Parallel transformer cannot call server routines.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 17, 2009 10:28 pm

A3) The big one - parallel Transformer can executed on any node in a clustered/grid configuration, whether or not the DataStage server engine is visible. You don't need any others.

tostay2003 · Post by **tostay2003** » Wed Jun 17, 2009 11:36 pm

Thanks for the responses. Can you please provide with an example (score dump) before and after (performance tuning) scoredump

tostay2003 wrote:....
I found the below in Advanced Developer's Guide
The score dump is particularly useful in showing you where DataStage is
inserting additional components in the job flow. In particular DataStage
will add partition and sort operators where the logic of the job demands it.
Sorts in particular can be detrimental to performance and a score dump
can help you to detect superfluous operators and amend the job design to
remove them.
Does anyone have any real example of score dump, where such scenarios occured (superfluous operators, detrimental sorts, added partitioning) and was improved in performance by changing the job. If possible the score dump after making the amendments to the job for performance. I am having hard time trying to visualize with just textual information.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 17, 2009 11:39 pm

An example without a score dump (these tend to be governed by non-disclosure agreements signed with clients). DataStage inserted tsort operators on both inputs to Join stage even though the data were already sorted by the SQL query. So the data were re-sorted unnecessarily (and it was a large number of rows). Using explicit Sort stages, with sort mode set to "don't sort (previously sorted)" obviated execution of the unnecessary sort. If I recall correctly, the overall elapsed time was reduced by approximately 15%.

tostay2003 · Post by **tostay2003** » Fri Jul 03, 2009 10:49 pm

Thanks for the example of sort.

could you let me know scenario when detrimental added partitioning is introduced by datastage and how to overcome it.

(verbal example as earlier would be sufficient)

jcthornton · Post by **jcthornton** » Sat Jul 04, 2009 9:00 am

As indicated above, when using Auto Partitioning, the default behavior of DS is to put in Sort and Repartition on inputs anytime it is set to 'Auto' partitioning and it thinks that the inbound data is does not match the stage requirement.

There are two ways to overcome this.
1. Turn off auto partitioning and auto sorting using the environment variables
The caveat here is you have to know what you are doing with partitioning and explicitly set partitioning everywhere you want it. 'Auto' partitioning will not do anything. The advantage of this over the other solution is that combinability of stages is preserved.

2. Explicitly set the partitioning using 'Same'.