Some questions and clarifications

splayer · Post by **splayer** » Wed Jan 28, 2015 4:17 pm

1)Is there something called "sort aggregator"?

2)Is there a difference between hash partitioning and key partitioning? I would think not.

3)Apparently, data sets use the same data type as the parallel framework.
What is the parallel framework data type called?

4)Can parallel routines be written that are called before or after a stage (not job) runs?

Thanks.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 28, 2015 4:33 pm

Are these interview questions?

If not, where do the questions (and the terminology) come from?

splayer · Post by **splayer** » Wed Jan 28, 2015 5:23 pm

I came across these issues while studying for the certification exam.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 28, 2015 6:25 pm

Care to be more specific? I am concerned that at least one of these terms reflects misunderstanding, or worse, in the composer of the question.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 28, 2015 6:36 pm

Here's how I might go about answering these. (Usual disclaimer about all my own work, opinions not necessarily those of IBM or my employer, etc., etc.)

(1) Depends what is meant by "something".

(2) Yes. There is a difference. (The question seeks no more than that.)

(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".

(4) Depends on whether the question means "called directly" or "called (indirectly").

splayer · Post by **splayer** » Wed Jan 28, 2015 9:38 pm

(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.

(2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?

(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?

(4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 28, 2015 11:29 pm

Not until you reveal where these questions came from.

jerome_rajan · Post by **jerome_rajan** » Thu Jan 29, 2015 12:16 am

splayer wrote:(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.

*Sort Aggregator probably refers to the aggregator method. On similar lines, there could be a hash aggregator

splayer wrote: (2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?

*Hash partitioning is just one of the key partitioning methods. The other being modulus which is specific to numeric keys.

splayer wrote: (3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?

*Persistent datasets consist of a descriptor and a set of data files. The actual data is contained within these data files that are in the internal DS format. They can be read either through orchadmin or through the dataset management utility.

splayer wrote: (4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.

prasson_ibm · Post by **prasson_ibm** » Thu Jan 29, 2015 5:43 am

Sort Aggregator is the method of doing aggregation(aggregator stage), basically there are two methods:-
Sort Aggregator
Hash Aggregator
Sort Aggregator is useful if you have more number of groups. In case of sort aggregator, all grouping columns need to be presorted and partitioned, and stage will output records for each end of data group or end of data.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 29, 2015 3:37 pm

You see what I mean? The posted answers have made assumptions about what is meant by terms in the questions. I preferred to wait until those ambiguities had been resolved. I'm still waiting. And none of the given answers is complete.