change data capture - How can I preserve certain fields?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pibe86
Participant
Posts: 2
Joined: Mon Mar 16, 2015 4:45 pm

change data capture - How can I preserve certain fields?

Post by pibe86 »

Hello,

I have two flat files with this format.


file 1
id|desc|value|
1|test1|123
2|test|256
3|test2|951

file 2
id|desc|value|
1|test1|999
4|test|256
6|test2|951

I join these 2 files in one. Where ID is the same i put them together in one line. I need to know how can i preserve the value for field "value" from the file 1 and not the file 2

This is my result

1|test1|123
2|test|256
3|test2|951
4|test|256
6|test2|951

this is what i expect

1|test1|999
2|test|256
3|test2|951
4|test|256
6|test2|951


Can I use join or lookup? or maybe CDC?

thanks for your help.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Use the join stage with a full outer join. You will have access to all of the columns from the left link as well as all of the columns from the right link. Keep the ones that you want.

Mike
pibe86
Participant
Posts: 2
Joined: Mon Mar 16, 2015 4:45 pm

Post by pibe86 »

Thanks. I tried it but didnt work, i am getting random info, sometimes it shows table 1 values, other times it shows table 2 values.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Make sure you run your job on a single node, that should take any 'randomness' out of the picture.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

A full outer run sequentially sounds slow but would work.

If these are large files, you should be able to add an extra column by renaming the column in file2 to value2 (for example) so that when merged with a regular inner join you get (id, desc, value, value2) in your output, preserving both values. If you do something like that and then use a partitioning that splits by keys you can do it in parallel as well.

A transformer on the back end could make a decision on which column to keep (custom merge of value and value2) or however you want to actually handle the issue?

Or, actually, it may just be that you need to swap the "left and right" and try it again as you already have it?
Post Reply