Some questions and clarifications

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Some questions and clarifications

Post by splayer »

1)Is there something called "sort aggregator"?

2)Is there a difference between hash partitioning and key partitioning? I would think not.

3)Apparently, data sets use the same data type as the parallel framework.
What is the parallel framework data type called?

4)Can parallel routines be written that are called before or after a stage (not job) runs?

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are these interview questions?

If not, where do the questions (and the terminology) come from?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

I came across these issues while studying for the certification exam.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Care to be more specific? I am concerned that at least one of these terms reflects misunderstanding, or worse, in the composer of the question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Here's how I might go about answering these. (Usual disclaimer about all my own work, opinions not necessarily those of IBM or my employer, etc., etc.)

(1) Depends what is meant by "something".

(2) Yes. There is a difference. (The question seeks no more than that.)

(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".

(4) Depends on whether the question means "called directly" or "called (indirectly").
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.

(2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?

(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?

(4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not until you reveal where these questions came from.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

splayer wrote:(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.
*Sort Aggregator probably refers to the aggregator method. On similar lines, there could be a hash aggregator
splayer wrote: (2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?
*Hash partitioning is just one of the key partitioning methods. The other being modulus which is specific to numeric keys.
splayer wrote: (3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?
*Persistent datasets consist of a descriptor and a set of data files. The actual data is contained within these data files that are in the internal DS format. They can be read either through orchadmin or through the dataset management utility.
splayer wrote: (4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Sort Aggregator is the method of doing aggregation(aggregator stage), basically there are two methods:-
Sort Aggregator
Hash Aggregator
Sort Aggregator is useful if you have more number of groups. In case of sort aggregator, all grouping columns need to be presorted and partitioned, and stage will output records for each end of data group or end of data.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You see what I mean? The posted answers have made assumptions about what is meant by terms in the questions. I preferred to wait until those ambiguities had been resolved. I'm still waiting. And none of the given answers is complete.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply