I have designed a simple job to capture the changed data.
Content of File 1:
col1 col2
1,a
2,b
3,c
4,d
5,e
6,f
7,g
8,h
9,i
Content of File 2:
col1 col2
1,a
3,c
4,d
10,e
6,f
7,g
12,h
9,i
1111,k
Code: Select all
File1 ---------->Sort ------->
Change Capture ---------------> Output File
File2 ---------->Sort ------->
Key = col1
change value = col2
Output:
col1 col2 change_code
2 b 2
5 e 2
8 h 2
10 e 1
12 h 1
1111 k 1
It is correct. Right?
I just used "Alternative way" to do the same.
The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.
I used
a) Hash partition
b) col1 as the key
c) Perform Sort with Stable option
in the change capture stage. I removed the Sort stage. I ran the job and got the below output. it is the reverse of above method which is not correct.
Output:
col1 col2 change_code
2 b 1
5 e 1
8 h 1
10 e 2
12 h 2
1111 k 2
I don't know what is happening. Experts inputs are welcome!