Change Capture with sort stage/key partition and sorted

karthi_gana · Post by **karthi_gana** » Sun Mar 25, 2012 1:47 am

All,

I have designed a simple job to capture the changed data.

Content of File 1:

col1 col2
1,a
2,b
3,c
4,d
5,e
6,f
7,g
8,h
9,i

Content of File 2:

col1 col2
1,a

3,c
4,d
10,e
6,f
7,g
12,h
9,i
1111,k

Code: Select all


File1 ---------->Sort -------> 
                                        Change Capture ---------------> Output File
File2 ---------->Sort ------->

Sort key = Col1

Key = col1
change value = col2

Output:

col1 col2 change_code
2 b 2
5 e 2
8 h 2
10 e 1
12 h 1
1111 k 1

It is correct. Right?

I just used "Alternative way" to do the same.

The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

I used

a) Hash partition
b) col1 as the key
c) Perform Sort with Stable option

in the change capture stage. I removed the Sort stage. I ran the job and got the below output. it is the reverse of above method which is not correct.

Output:

col1 col2 change_code
2 b 1
5 e 1
8 h 1
10 e 2
12 h 2
1111 k 2

I don't know what is happening. Experts inputs are welcome!

Mike · Post by **Mike** » Sun Mar 25, 2012 9:15 am

You probably messed up your link order. Inserts and deletes are reversed when you reverse the before/after datasets.

Mike