Compare fields in two different flat files

Arpanakhade2009 · Post by **Arpanakhade2009** » Mon Dec 02, 2013 4:37 am

We need to compare two different files as below and output should be the record no. and the field name which does not match with the before file.
Also deleted records from before file need to be captured.

All fields are Nvarchar & we cannot sort the fields based on key field col1 as sequence is important and comparison should be done in the sequence order only.

before file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
1,1,1
2,2,2
3,3,3

after file:
col1, col2, col3
1,2,3
4,9,6
7,8,8
5,2,2
3,3,3
4,4,4

expected o/p

record 2 , col2 , update
record 3, col3, update
record 4, col1, delete
record 4, col2, delete
record 4, col3, delete

Tried Join, Difference but no luck. Also tried introducing a new col surr key but then will need to make a new join for every field. Or are we left with use of transformer.

Please suggest a design logic.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 02, 2013 5:01 am

What does "no luck" mean?

Have you tried Change Capture stage?

Arpanakhade2009 · Post by **Arpanakhade2009** » Mon Dec 02, 2013 5:04 am

Have tried that but change capture did not give the expected results.

chulett · Post by **chulett** » Mon Dec 02, 2013 7:49 am

Well... how did you set up the stage? What results did you get?

ray.wurlod · Post by **ray.wurlod** » Mon Dec 02, 2013 2:51 pm

How are you partitioning your data? Does it work (whatever that means) if you run on a single node configuration?

bart12872 · Post by **bart12872** » Tue Dec 03, 2013 10:42 am

you are missing information,

you should create a surrogate key in the before file based on the line number
it will be your key

but in your exemple, why do you say "record 4, col1, delete"
I see it as an update
record 4 before 1,1,1 after 5,2,2
the output could be
record 4, col1, update
record 4, col2, update
record 4, col3, update

Let's say you have
before file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
1,1,1
2,2,2
3,3,3

after file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
5,2,2
1,1,1
2,2,2
3,3,3

how do you know that 1,1,1 is an update or a delete