Page 1 of 1

Data Integrity

Posted: Mon Nov 20, 2006 2:40 am
by Munish
Hi There,

We are loading data from sequential file into Oracle DB using OCI-Load,
If we use Upsert, it provides us facility of Reject file.
However, Load property does not provide that functionality.
Since the data is massive, we can not use Upsert.

Could any one please advice about the different scenarios for

1. To confirm the data integrity vis a vis sequential file and Database post load (If job runs successfully).

2. If the job crashes midway,
how to check if the data which is read by datastage has actually gone inside the database ie can there be a possibility that Datastage might have read 100 records but only 99 have been actually loaded.

3. What should be the restart/ recover mechanism on crash, so that we do not have rerun the whole batch process.

It is a 8-10 hours of batch window and 200 gig of sequential file load.

I look forward to have your expertise comments.
Thanks
MK

Posted: Mon Nov 20, 2006 6:07 am
by aakashahuja
Not sure about the way Oracle technically works. But in similar scenarios with Db2 that we use.

1. Not sure about confirming the integrity but as far as I know, after a load (in Db2), it points that the integrity check on the table is pending.

2. To tackle this scenario, before loading, we do a lookup on the same table (which is being loaded), to see if the record already exists or not. If not then we load else we leave it out.

3. Ans 2 covers this.

Hope this helps.

Posted: Mon Nov 20, 2006 8:54 am
by Sreedhar
The best way would be

DataBase
|

ChangeCapture ---------------> NewDataSet
^
|
DataSet


DataBase would be the one where we are tyring to load the data.

Dataset is the file which you are tyring to load.

New dataset will have the values which are present on the DataSet but not loaded into the DataBase

so just make use of the NewDataset to load it again so that only the once which have failed loading in the initial stage will be loaded in the next attempt by this you can avoid the duplicate insertion i.e.
if in the initial case, out of 100, only 90, have been loaded then... in the next attempt the 90 will be discarded, only the 10 will be loaded.

hope this help's you.

Posted: Mon Nov 20, 2006 4:09 pm
by Munish
Thanks Buddies,
Your replies are informative and in line with our thoughts.

Does Datastage gives any sort of confirmation in log or in any other way of successful load
or
is there any property that confirms that Load is successful
without going to Database for rechecking.

Since, in our case data from one sequential file is splitted and loaded to 15 different tables and
out of those 15, one table may have more than 100 mills records by the end of successful batch.
We do not want to go back to DB for confirmation.

Any thoughts !!!

Cheers,
MK

Posted: Mon Nov 20, 2006 4:20 pm
by thebird
Munish wrote:Thanks Buddies,
Does Datastage gives any sort of confirmation in log or in any other way of successful load
or
is there any property that confirms that Load is successful
without going to Database for rechecking.
I would assume that the Job log gives you the information regarding the records loaded, when the Oracle Enterprise stage is used.

The Bird