Insert/Update Job Design: Comments?
Moderators: chulett, rschirm, roy
Insert/Update Job Design: Comments?
I am experimenting with the Compare Stage (trying to determine its usefullness). I am exploring the following scenario.
Requirement:
A. Insert of all New records to DataWarehouse
B. Update DW when changes are made to specific columns ONLY.
Any thoughts or comments?
Step 1. Obtain seq file containing TODAY transactions.
Step 2.
Job1: DBMS lookup to determine INSERTS (key not found)
Source Link= Today.seq
Ref Link = DW
<<LookupStage>>
Output Link1 [NOT.FOUND]= Inserts.ds
Output Link2 [FOUND]= CouldBeUpdate.ds
Output Link3 [FOUND]= FoundInDw.ds
Job2: Compare-Determine which rows have a change to the required columns
Source Link1= CouldBeUpdate.ds
Source Link2= FoundInDw.ds
<<CompareStage>>
Output Link =CompareResults.ds
Job3: Seperate updates from no change rows
Source Link1= CompareResults.ds
<<SwitchStage>>
Output Link1 [Result Code<>0] = Updates.ds
Output Link2 [rej] = NoChange.ds
Step 3.
Do Inserts and Updates to the DW
Inserts.ds
Updates.ds
Requirement:
A. Insert of all New records to DataWarehouse
B. Update DW when changes are made to specific columns ONLY.
Any thoughts or comments?
Step 1. Obtain seq file containing TODAY transactions.
Step 2.
Job1: DBMS lookup to determine INSERTS (key not found)
Source Link= Today.seq
Ref Link = DW
<<LookupStage>>
Output Link1 [NOT.FOUND]= Inserts.ds
Output Link2 [FOUND]= CouldBeUpdate.ds
Output Link3 [FOUND]= FoundInDw.ds
Job2: Compare-Determine which rows have a change to the required columns
Source Link1= CouldBeUpdate.ds
Source Link2= FoundInDw.ds
<<CompareStage>>
Output Link =CompareResults.ds
Job3: Seperate updates from no change rows
Source Link1= CompareResults.ds
<<SwitchStage>>
Output Link1 [Result Code<>0] = Updates.ds
Output Link2 [rej] = NoChange.ds
Step 3.
Do Inserts and Updates to the DW
Inserts.ds
Updates.ds
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
Hi lshort,
Look into the CDC stage. The documentation in parjdev.pdf gives additional information on CDC stage. It is used to identify new inserts, updates as well as deletes.
You can finish it in one job. The CDC stage requires 2 inputs. Use the flat file as one input and the DBMS data as the other input. Define the keys and values in CDC stage. The CDC stage provides an additional column known as change_code which is used to identify an insert, update, delete or an exact copy.
HTH
Rich
Look into the CDC stage. The documentation in parjdev.pdf gives additional information on CDC stage. It is used to identify new inserts, updates as well as deletes.
You can finish it in one job. The CDC stage requires 2 inputs. Use the flat file as one input and the DBMS data as the other input. Define the keys and values in CDC stage. The CDC stage provides an additional column known as change_code which is used to identify an insert, update, delete or an exact copy.
HTH
Rich
-
- Participant
- Posts: 75
- Joined: Tue May 13, 2003 4:14 am
- Location: California
- Contact:
Pardon a potentially dumb question. What is the CDC Stage? I couldn't find references to it in the parjdev manual. Is this the change capture stage?richdhan wrote:Hi lshort,
Look into the CDC stage. The documentation in parjdev.pdf gives additional information on CDC stage. It is used to identify new inserts, updates as well as deletes.
Rich
We are looking into a similar situation (insert new/update only if certain fields change) and I found this posting in a search.
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I always felt you were better off doing surrogate key lookup first and then doing change data capture on the rows that already exist and sending the new rows directly down an insert path. A surrogate key lookup only retrieves a couple columns from the database while CDC returns every column. So lookup followed by CDC should be faster then CDC followed by lookup.
You can't get away with a CDC alone unless you have built dimensions without surrogate keys. A CDC stage can bring in rows from the dimension for comparison but it will not let you take or use a dimension field. You can only output rows from the "After" link, not the "Before" link.
We had shocking performance of upserts on DB2 and had to make sure we used seperate insert and update paths. The initial load of a table is the worst performer, DB2 has to perform full table scans for every row as it cannot use an index until stats have been run on the table.
You can't get away with a CDC alone unless you have built dimensions without surrogate keys. A CDC stage can bring in rows from the dimension for comparison but it will not let you take or use a dimension field. You can only output rows from the "After" link, not the "Before" link.
We had shocking performance of upserts on DB2 and had to make sure we used seperate insert and update paths. The initial load of a table is the worst performer, DB2 has to perform full table scans for every row as it cannot use an index until stats have been run on the table.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
a different type of question on this
i was working on some thing like the same job that you guys were discussing and in the end we want to write to the data base only there records that have a same data type and the ones that dont have the data types should be dropped out and these records should be written out to the reject file.
Is there any function out there like get stage info or some thin that can give us some details about the record be dropped since there is a different data type out there?
Thanks
-babloo
Is there any function out there like get stage info or some thin that can give us some details about the record be dropped since there is a different data type out there?
Thanks
-babloo