Interwier Question

varsha16785 · Post by **varsha16785** » Thu May 16, 2013 2:24 am

This is a question I was asked in an interview and I am not exactly able to figure out its answer. There is a file with data
1
1
2
3
4
4

The output should be 2 files. First with data as
1
1
4
4

and the other as
2
3.

The reply I gave was to use an aggregator and then filter but that will populate the duplicate records only once. I also tried key change value in sort but same output.

Please help!!

ray.wurlod · Post by **ray.wurlod** » Thu May 16, 2013 4:06 am

There are several ways this might be accomplished. In a real life example it would depend on the wider picture - the rule that determines which rows (which key values) go into which files. In most cases you'd be looking at constraint expressions in a Transformer stage, or you might be looking at using a Filter or Switch stage. For the given example you might even try something funky with partitioning.

chandra.shekhar@tcs.com · Thu May 16, 2013 4:10 am

The answer you told is also correct, calculate the count in Aggregator and then in Filter stage use constrain count >1 and count = 1.

sendmkpk · Post by **sendmkpk** » Thu May 16, 2013 4:39 am

i think there should be a more cost effective way of doing this, but i am not able to figure.....

experts, plz do give a try

vmcburney · Post by **vmcburney** » Sun May 19, 2013 7:11 pm

I assume the scenario is to sort the data into a duplicates file and a unique row file. I would use the Transformer LastRowInGroup function and sort and partition the data by the key field. Whenever this function returns a value of FALSE you have a duplicate key value and the row that follows belongs in the same group. You may need a Stage Variable counter and a constraint to output all rows in a group down one link and single rows down another link.

This was a good addition to DataStage 8.5, it effectively lets you peer into the future and compare the current row to the next incoming row, something you couldn't do in earlier versions.

vamsi.4a6 · Post by **vamsi.4a6** » Mon May 20, 2013 12:24 am

If keychange=1 and LastRowInGroup(InputColumn) then 1 else 0-Stv1

and use following two constraints

constraint-Stv1=1 for unique row file
Stv1=0 for duplicates file

Proper partition and sorting should be done.

Please correct me if i am wrong?

ray.wurlod · Post by **ray.wurlod** » Mon May 20, 2013 3:23 am

Too much code.

Constraint 1: InLink.KeyChange And LastRowInGroup(InLink.keycol)

Constraint 2: Otherwise/Log

No need to generate 1s and 0s - you're just wasting CPU cycles by doing so.

I now invite you to inspect your code and YOU answer the question whether it meets the original poster's requirements.

vamsi.4a6 · Post by **vamsi.4a6** » Mon May 20, 2013 4:05 am

Thanks for reply.As far as my knowledge it will give correct output as per the orginial post

Prince_Hyd · Post by **Prince_Hyd** » Mon May 20, 2013 12:05 pm

Hi ray

can you be clear with your solution where can i find KeyChange function in transformer.Please clearly explain the solution.

Thanks

ray.wurlod · Post by **ray.wurlod** » Mon May 20, 2013 1:24 pm

KeyChange comes from the upstream Sort stage in what I was thinking.

You can generate key change detection within the Transformer stage using two stage variables, one to detect the change and another to "remember" the key from the previous row.