DataStaging is supposed to give me a Restart point but......

Inquisitive · Post by **Inquisitive** » Wed Apr 14, 2004 5:29 pm

For the below mentioned approach, let say I extract and create a seq file of 65 million records. I do not have a process lookup. This file is an input to my process transform with the limitation that I do not have a staging area. So My process transform needs to create a load file. When processing the 64 millionth record, my job aborts. How can I ensure that when my job restarts, I start processing from where I left off.

I understand that DataStaging is done to have restart points. But if the restart point needs to processs millions of records, what do I do ? Looking forward to your suggestions.
**********************************************************
1. Process Source: extract from a source system table into a sequential file. Cleanse the source data and conform it to the warehouse standards. Eliminate junk data, cleanse invalid columns, and audit incoming data. Differentiate current data snapshot to previous snapshot for rudimentary changed data detection. The resulting data sets should be in sequential files.
2. Process Lookup: create custom lookups that will facilitate transformation. These lookups should be in either relational tables, or sequential files for easy loading into arrays in memory.
3. Process Transform: transform the collected source data into a target table sequential file. Eliminate junk data, cleanse invalid columns, and audit after transformation data. (Note: cleansing occurs during both sourcing and transforming, because transformation may cause a resulting value to violate the definition of the target column contents, from the database and business rule aspects.) Differentiate final transformed data to current warehouse data for rudimentary changed data detection. (Note: changed data detection can happen in two places: sourcing and transforming. A row may only be determined to have changed once transformation is complete because the change may occur on a referenced table row.) The resulting data should either be in relational tables, and/or sequential files.
4. Process Recover: create “before update” image of target rows within the data warehouse to provide database recover capability. The data should be in sequential files. This “before update” image can be leveraged when incrementally updating aggregate tables, as it allows you to “delta” the influence of a base fact row in an aggregate row that undergoes an update.
5. Process Load: load the target table sequential files into the target database.
Once the processes are being developed into the five described segments, the execution order of these segments must be addressed. There are two basic ways: horizontal and vertical banding. Since the microscopic view of ETL would be to Source, Lookup, Transform, and Load all in one scripted process, you end up with a whole series of horizontally banded scripts.

kcbland · Post by **kcbland** » Wed Apr 14, 2004 7:46 pm

Hey, how many places you going to post this message???? I replied over at viewtopic.php?p=105433#105433