Page 1 of 1

change data capture - How can I preserve certain fields?

Posted: Thu Mar 31, 2016 3:07 pm
by pibe86
Hello,

I have two flat files with this format.


file 1
id|desc|value|
1|test1|123
2|test|256
3|test2|951

file 2
id|desc|value|
1|test1|999
4|test|256
6|test2|951

I join these 2 files in one. Where ID is the same i put them together in one line. I need to know how can i preserve the value for field "value" from the file 1 and not the file 2

This is my result

1|test1|123
2|test|256
3|test2|951
4|test|256
6|test2|951

this is what i expect

1|test1|999
2|test|256
3|test2|951
4|test|256
6|test2|951


Can I use join or lookup? or maybe CDC?

thanks for your help.

Posted: Thu Mar 31, 2016 4:17 pm
by Mike
Use the join stage with a full outer join. You will have access to all of the columns from the left link as well as all of the columns from the right link. Keep the ones that you want.

Mike

Posted: Fri Apr 01, 2016 9:06 am
by pibe86
Thanks. I tried it but didnt work, i am getting random info, sometimes it shows table 1 values, other times it shows table 2 values.

Posted: Fri Apr 01, 2016 9:17 am
by chulett
Make sure you run your job on a single node, that should take any 'randomness' out of the picture.

Posted: Mon Apr 04, 2016 9:29 am
by UCDI
A full outer run sequentially sounds slow but would work.

If these are large files, you should be able to add an extra column by renaming the column in file2 to value2 (for example) so that when merged with a regular inner join you get (id, desc, value, value2) in your output, preserving both values. If you do something like that and then use a partitioning that splits by keys you can do it in parallel as well.

A transformer on the back end could make a decision on which column to keep (custom merge of value and value2) or however you want to actually handle the issue?

Or, actually, it may just be that you need to swap the "left and right" and try it again as you already have it?