logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
iskapalli
Participant
Posts: 76
Joined: Wed Jul 25, 2007 12:36 am

logic

Post by iskapalli »

hi,
I have to implement below logic in datastage(not with custom query)

SELECT COL1 FROM DBASE GROUP BY COL1 HAVING COUNT(*) > 1 .

as of now iam following below approach.

Use a copy stage to split the incoming records into two streams. One stream goes to an Aggregator stage that groups the records by key field(s) and counts the number of records in each group and outputs the results to the COUNT field. The output from Aggregator stage is then joined to the other stream using a Join stage on key field(s) and the results are then passed to Transformer stage. In Transformer, you could put a constraint like COUNT>2.



Is there any best way to do this?
iskapalli
Participant
Posts: 76
Joined: Wed Jul 25, 2007 12:36 am

Post by iskapalli »

sorry ,Small correction


Use a copy stage to split the incoming records into two streams. One stream goes to an Aggregator stage that groups the records by key field(s) and counts the number of records in each group and outputs the results to the COUNT field. The output from Aggregator stage is then joined to the other stream using a Join stage on key field(s) and the results are then passed to Transformer stage. In Transformer, you could put a constraint like COUNT>1
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you think of any other way to do this? If not, what you have is the best (only) way.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Had posted an incorrect thought...Deleting it.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Your solution is probably the best way for versions of IS prior to 8.5. Make certain that you partition and sort correctly prior to the copy stage.

For 8.5 and above, a sort stage with keyChange enabled, followed by a transformer is the simplest. Set a stage variable to @FALSE when keyChange=1 and LastRowInGroup()=1 (this means the group has only one row), otherwise set it to @TRUE. Then in the constraint for your output link only allow the rows where the stage variable is @TRUE.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
iskapalli
Participant
Posts: 76
Joined: Wed Jul 25, 2007 12:36 am

Post by iskapalli »

Hi jwiles ,

thank for your mail

Is this approach will work on two nodes or four nodes also?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is always the case that a solution that works on two nodes will work on four, or 2000. Such is guaranteed by the way that the parallel execution engine works.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

jwiles wrote:Your solution is probably the best way for versions of IS prior to 8.5. Make certain that you partition and sort correctly prior to the copy stage.

For 8.5 and above, a sort stage with keyChange enabled, followed by a transformer is the simplest. Set a stage variable to @FALSE when keyChange=1 and LastRowInGroup()=1 (this means the group has only one row), otherwise set it to @TRUE. Then in the constraint for your output link only allow the rows where the stage variable is @TRUE.

Regards,
@Jwiles

I have below clarification .Could you please clarify

when i gone through the below link

http://publib.boulder.ibm.com/infocente ... eaks2.html

we need to use LastRowInGroup() function with argument but u used without argument[ LastRowInGroup()=1 ].Please explain how it works without argument and correct me if i am wrong?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

My example was not a literal example of how to use the function, it was an example with the intent of indicating that LastRowInGroup is the function you should use. Use it as the documentation indicates (i.e. with the required arguments).

"function_name()" is a common method of writing the name of a function in a text conversation. Sorry for the confusion...

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

@jwiles

Thanks for clarifying the doubt and your help is really appreciated.
Post Reply