Interwier Question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
varsha16785
Participant
Posts: 4
Joined: Wed May 15, 2013 1:25 am

Interwier Question

Post by varsha16785 »

This is a question I was asked in an interview and I am not exactly able to figure out its answer. There is a file with data
1
1
2
3
4
4

The output should be 2 files. First with data as
1
1
4
4

and the other as
2
3.

The reply I gave was to use an aggregator and then filter but that will populate the duplicate records only once. I also tried key change value in sort but same output.

Please help!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are several ways this might be accomplished. In a real life example it would depend on the wider picture - the rule that determines which rows (which key values) go into which files. In most cases you'd be looking at constraint expressions in a Transformer stage, or you might be looking at using a Filter or Switch stage. For the given example you might even try something funky with partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

The answer you told is also correct, calculate the count in Aggregator and then in Filter stage use constrain count >1 and count = 1.
Thanx and Regards,
ETL User
sendmkpk
Premium Member
Premium Member
Posts: 97
Joined: Mon Apr 02, 2007 2:47 am

Post by sendmkpk »

i think there should be a more cost effective way of doing this, but i am not able to figure.....

experts, plz do give a try
Praveen
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I assume the scenario is to sort the data into a duplicates file and a unique row file. I would use the Transformer LastRowInGroup function and sort and partition the data by the key field. Whenever this function returns a value of FALSE you have a duplicate key value and the row that follows belongs in the same group. You may need a Stage Variable counter and a constraint to output all rows in a group down one link and single rows down another link.

This was a good addition to DataStage 8.5, it effectively lets you peer into the future and compare the current row to the next incoming row, something you couldn't do in earlier versions.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

If keychange=1 and LastRowInGroup(InputColumn) then 1 else 0-Stv1

and use following two constraints

constraint-Stv1=1 for unique row file
Stv1=0 for duplicates file

Proper partition and sorting should be done.

Please correct me if i am wrong?
Thanks and Regards
Vamsi krishna.v
http://datastage-vamsi.blogspot.in/
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Too much code.

Constraint 1: InLink.KeyChange And LastRowInGroup(InLink.keycol)

Constraint 2: Otherwise/Log

No need to generate 1s and 0s - you're just wasting CPU cycles by doing so.

I now invite you to inspect your code and YOU answer the question whether it meets the original poster's requirements.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

Thanks for reply.As far as my knowledge it will give correct output as per the orginial post
Thanks and Regards
Vamsi krishna.v
http://datastage-vamsi.blogspot.in/
Prince_Hyd
Participant
Posts: 35
Joined: Mon May 06, 2013 5:59 am

Post by Prince_Hyd »

Hi ray

can you be clear with your solution where can i find KeyChange function in transformer.Please clearly explain the solution.




Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

KeyChange comes from the upstream Sort stage in what I was thinking.

You can generate key change detection within the Transformer stage using two stage variables, one to detect the change and another to "remember" the key from the previous row.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply