Some questions and clarifications
Moderators: chulett, rschirm, roy
Some questions and clarifications
1)Is there something called "sort aggregator"?
2)Is there a difference between hash partitioning and key partitioning? I would think not.
3)Apparently, data sets use the same data type as the parallel framework.
What is the parallel framework data type called?
4)Can parallel routines be written that are called before or after a stage (not job) runs?
Thanks.
2)Is there a difference between hash partitioning and key partitioning? I would think not.
3)Apparently, data sets use the same data type as the parallel framework.
What is the parallel framework data type called?
4)Can parallel routines be written that are called before or after a stage (not job) runs?
Thanks.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Here's how I might go about answering these. (Usual disclaimer about all my own work, opinions not necessarily those of IBM or my employer, etc., etc.)
(1) Depends what is meant by "something".
(2) Yes. There is a difference. (The question seeks no more than that.)
(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
(4) Depends on whether the question means "called directly" or "called (indirectly").
(1) Depends what is meant by "something".
(2) Yes. There is a difference. (The question seeks no more than that.)
(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
(4) Depends on whether the question means "called directly" or "called (indirectly").
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.
(2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?
(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?
(4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.
(2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?
(3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?
(4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
*Sort Aggregator probably refers to the aggregator method. On similar lines, there could be a hash aggregatorsplayer wrote:(1) Depends what is meant by "something".
--I meant, is there a stage? I know that there is an aggregator but I have never heard of a sort aggregator.
*Hash partitioning is just one of the key partitioning methods. The other being modulus which is specific to numeric keys.splayer wrote: (2) Yes. There is a difference. (The question seeks no more than that.)
--Can you please tell me what the difference is?
*Persistent datasets consist of a descriptor and a set of data files. The actual data is contained within these data files that are in the internal DS format. They can be read either through orchadmin or through the dataset management utility.splayer wrote: (3) Usually "Data Sets". Or "virtual Data Sets". Sometimes, though inaccurately, "record schemas".
--"Data Sets" or "virtual Data Sets" are not really data types, right, like integer or varchar? My question was really alluding to the fact that, does Datastage convert all data types to an internal parallel framework data type and convert back like Informatica does?
splayer wrote: (4) Depends on whether the question means "called directly" or "called (indirectly").
--Can you tell me what you mean by calling a routine "directly" or "indirectly"? I don't see how a parallel routine can be called before execution enters a stage and after? I have never heard that.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Sort Aggregator is the method of doing aggregation(aggregator stage), basically there are two methods:-
Sort Aggregator
Hash Aggregator
Sort Aggregator is useful if you have more number of groups. In case of sort aggregator, all grouping columns need to be presorted and partitioned, and stage will output records for each end of data group or end of data.
Sort Aggregator
Hash Aggregator
Sort Aggregator is useful if you have more number of groups. In case of sort aggregator, all grouping columns need to be presorted and partitioned, and stage will output records for each end of data group or end of data.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You see what I mean? The posted answers have made assumptions about what is meant by terms in the questions. I preferred to wait until those ambiguities had been resolved. I'm still waiting. And none of the given answers is complete.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.