Posted: Fri Jun 03, 2005 7:55 am
Thanks a lot.
http://img240.echo.cx/img240/7100/crc7ih.jpg
Please click on the link above for my job design. It works this way..
I'll assume file has col1, col2 and col1 is the key column.
1. I'll feed Afterset through a TRX and add a column to the existing columns and the value of the column is "1" for all the rows.
2. I'll do the same for BeforeSet, but the column value would be "2".
3. From TRXAfter, I'll store all the key col values in the HshAfter file.
4. In TRXBefore, I'll store keys in Beforeset that are there in Afterset using the HshAfter file. The output of this is HshCommon file.
5. I'll merge rows from TRXAfter and TRXBefore into a file.
6. I'll feed the rows from merge file into aggregator where I'll group by col1, col2 and will select min(col3) (we'll name it mincol3), max(col3) (maxcol3). (col3 is from TRXafter and TRXbefore).
7. In the Transformer i'll do a lookup on HshCommon file created before and Identify the changed records by checking the following constraints..
(Remember HshCommon has keys that are common in both files)
a. Lookup not found and mincol3=1 (inserts)
b. Lookup found and maxcol3=1 (updates)
c. Lookup not found and mincol3=2 (Deletes)
d. Lookup found and mincol3<>maxcol3 (Unchanged)
This will give us four files with Inserts, updates, deletes and unchanged. This becomes kind of tedious when you have lot of cols in your source files as aggregator takes that long to give the output. I could have used rowcollector instead of merge, but then there will be a Cyclic redundancy. I hope you guys understood what i explained above.
I'll explain with Data..
Assume my Afterset has
col1 col2
--------------
10 XX
20 YY
30 ZZ
40 ZX
60 AA
My BeforeSet is
Col1 col2
----------------
10 XX
20 YY
30 ZZ
40 ZY
50 AB
If you look in Afterset, row 40 is chnaged from ZY to ZX, row 60 is a insert and row 50 from Beforeset is not there in Afterset which means its a delete.
Data in TRXAfter
col1 col2 col3
---------------------
10 XX 1
20 YY 1
30 ZZ 1
40 ZX 1
60 AA 1
Data in TRXBefore
col1 col2 col3
---------------------
10 XX 2
20 YY 2
30 ZZ 2
40 ZY 2
50 AB 2
Data in HshAfter
col1
----
10
20
30
40
60
Data in HshCommon
Col1
-----
10
20
30
40
Data in Merge
col1 col2 col3
---------------------
10 XX 1
20 YY 1
30 ZZ 1
40 ZX 1
60 AA 1
10 XX 2
20 YY 2
30 ZZ 2
40 ZY 2
50 AB 2
Data After Aggregator
col1 col2 mincol3 maxcol3
-------------------------------------
10 XX 1 2
20 YY 1 2
30 ZZ 1 2
40 ZX 1 1
60 AA 1 1
40 ZY 2 2
50 AB 2 2
In TRXIdentify...when we look at the constraints
Inserts would be row 60. Delete is Row 50 , update is row 40. Remaining all are unchanged..
What do you think of this..? I know it is not customizable and all.. but I just showed an approach..
http://img240.echo.cx/img240/7100/crc7ih.jpg
Please click on the link above for my job design. It works this way..
I'll assume file has col1, col2 and col1 is the key column.
1. I'll feed Afterset through a TRX and add a column to the existing columns and the value of the column is "1" for all the rows.
2. I'll do the same for BeforeSet, but the column value would be "2".
3. From TRXAfter, I'll store all the key col values in the HshAfter file.
4. In TRXBefore, I'll store keys in Beforeset that are there in Afterset using the HshAfter file. The output of this is HshCommon file.
5. I'll merge rows from TRXAfter and TRXBefore into a file.
6. I'll feed the rows from merge file into aggregator where I'll group by col1, col2 and will select min(col3) (we'll name it mincol3), max(col3) (maxcol3). (col3 is from TRXafter and TRXbefore).
7. In the Transformer i'll do a lookup on HshCommon file created before and Identify the changed records by checking the following constraints..
(Remember HshCommon has keys that are common in both files)
a. Lookup not found and mincol3=1 (inserts)
b. Lookup found and maxcol3=1 (updates)
c. Lookup not found and mincol3=2 (Deletes)
d. Lookup found and mincol3<>maxcol3 (Unchanged)
This will give us four files with Inserts, updates, deletes and unchanged. This becomes kind of tedious when you have lot of cols in your source files as aggregator takes that long to give the output. I could have used rowcollector instead of merge, but then there will be a Cyclic redundancy. I hope you guys understood what i explained above.
I'll explain with Data..
Assume my Afterset has
col1 col2
--------------
10 XX
20 YY
30 ZZ
40 ZX
60 AA
My BeforeSet is
Col1 col2
----------------
10 XX
20 YY
30 ZZ
40 ZY
50 AB
If you look in Afterset, row 40 is chnaged from ZY to ZX, row 60 is a insert and row 50 from Beforeset is not there in Afterset which means its a delete.
Data in TRXAfter
col1 col2 col3
---------------------
10 XX 1
20 YY 1
30 ZZ 1
40 ZX 1
60 AA 1
Data in TRXBefore
col1 col2 col3
---------------------
10 XX 2
20 YY 2
30 ZZ 2
40 ZY 2
50 AB 2
Data in HshAfter
col1
----
10
20
30
40
60
Data in HshCommon
Col1
-----
10
20
30
40
Data in Merge
col1 col2 col3
---------------------
10 XX 1
20 YY 1
30 ZZ 1
40 ZX 1
60 AA 1
10 XX 2
20 YY 2
30 ZZ 2
40 ZY 2
50 AB 2
Data After Aggregator
col1 col2 mincol3 maxcol3
-------------------------------------
10 XX 1 2
20 YY 1 2
30 ZZ 1 2
40 ZX 1 1
60 AA 1 1
40 ZY 2 2
50 AB 2 2
In TRXIdentify...when we look at the constraints
Inserts would be row 60. Delete is Row 50 , update is row 40. Remaining all are unchanged..
What do you think of this..? I know it is not customizable and all.. but I just showed an approach..