Compare fields in two different flat files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Arpanakhade2009
Participant
Posts: 30
Joined: Tue Sep 08, 2009 4:24 am

Compare fields in two different flat files

Post by Arpanakhade2009 »

We need to compare two different files as below and output should be the record no. and the field name which does not match with the before file.
Also deleted records from before file need to be captured.

All fields are Nvarchar & we cannot sort the fields based on key field col1 as sequence is important and comparison should be done in the sequence order only.

before file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
1,1,1
2,2,2
3,3,3

after file:
col1, col2, col3
1,2,3
4,9,6
7,8,8
5,2,2
3,3,3
4,4,4

expected o/p

record 2 , col2 , update
record 3, col3, update
record 4, col1, delete
record 4, col2, delete
record 4, col3, delete

Tried Join, Difference but no luck. Also tried introducing a new col surr key but then will need to make a new join for every field. Or are we left with use of transformer.

Please suggest a design logic.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What does "no luck" mean?

Have you tried Change Capture stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Arpanakhade2009
Participant
Posts: 30
Joined: Tue Sep 08, 2009 4:24 am

Post by Arpanakhade2009 »

Have tried that but change capture did not give the expected results.
Arpana
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... how did you set up the stage? What results did you get?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How are you partitioning your data? Does it work (whatever that means) if you run on a single node configuration?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

you are missing information,

you should create a surrogate key in the before file based on the line number
it will be your key

but in your exemple, why do you say "record 4, col1, delete"
I see it as an update
record 4 before 1,1,1 after 5,2,2
the output could be
record 4, col1, update
record 4, col2, update
record 4, col3, update


Let's say you have
before file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
1,1,1
2,2,2
3,3,3

after file:
col1, col2, col3
1,2,3
4,5,6
7,8,9
5,2,2
1,1,1
2,2,2
3,3,3

how do you know that 1,1,1 is an update or a delete
Post Reply