Page 1 of 1

Interwier Question

Posted: Thu May 16, 2013 2:24 am
by varsha16785
This is a question I was asked in an interview and I am not exactly able to figure out its answer. There is a file with data
1
1
2
3
4
4

The output should be 2 files. First with data as
1
1
4
4

and the other as
2
3.

The reply I gave was to use an aggregator and then filter but that will populate the duplicate records only once. I also tried key change value in sort but same output.

Please help!!

Posted: Thu May 16, 2013 4:06 am
by ray.wurlod
There are several ways this might be accomplished. In a real life example it would depend on the wider picture - the rule that determines which rows (which key values) go into which files. In most cases you'd be looking at constraint expressions in a Transformer stage, or you might be looking at using a Filter or Switch stage. For the given example you might even try something funky with partitioning.

Posted: Thu May 16, 2013 4:10 am
by chandra.shekhar@tcs.com
The answer you told is also correct, calculate the count in Aggregator and then in Filter stage use constrain count >1 and count = 1.

Posted: Thu May 16, 2013 4:39 am
by sendmkpk
i think there should be a more cost effective way of doing this, but i am not able to figure.....

experts, plz do give a try

Posted: Sun May 19, 2013 7:11 pm
by vmcburney
I assume the scenario is to sort the data into a duplicates file and a unique row file. I would use the Transformer LastRowInGroup function and sort and partition the data by the key field. Whenever this function returns a value of FALSE you have a duplicate key value and the row that follows belongs in the same group. You may need a Stage Variable counter and a constraint to output all rows in a group down one link and single rows down another link.

This was a good addition to DataStage 8.5, it effectively lets you peer into the future and compare the current row to the next incoming row, something you couldn't do in earlier versions.

Posted: Mon May 20, 2013 12:24 am
by vamsi.4a6
If keychange=1 and LastRowInGroup(InputColumn) then 1 else 0-Stv1

and use following two constraints

constraint-Stv1=1 for unique row file
Stv1=0 for duplicates file

Proper partition and sorting should be done.

Please correct me if i am wrong?

Posted: Mon May 20, 2013 3:23 am
by ray.wurlod
Too much code.

Constraint 1: InLink.KeyChange And LastRowInGroup(InLink.keycol)

Constraint 2: Otherwise/Log

No need to generate 1s and 0s - you're just wasting CPU cycles by doing so.

I now invite you to inspect your code and YOU answer the question whether it meets the original poster's requirements.

Posted: Mon May 20, 2013 4:05 am
by vamsi.4a6
Thanks for reply.As far as my knowledge it will give correct output as per the orginial post

Posted: Mon May 20, 2013 12:05 pm
by Prince_Hyd
Hi ray

can you be clear with your solution where can i find KeyChange function in transformer.Please clearly explain the solution.




Thanks

Posted: Mon May 20, 2013 1:24 pm
by ray.wurlod
KeyChange comes from the upstream Sort stage in what I was thinking.

You can generate key change detection within the Transformer stage using two stage variables, one to detect the change and another to "remember" the key from the previous row.