Data Integrity

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Data Integrity

Post by Munish »

Hi There,

We are loading data from sequential file into Oracle DB using OCI-Load,
If we use Upsert, it provides us facility of Reject file.
However, Load property does not provide that functionality.
Since the data is massive, we can not use Upsert.

Could any one please advice about the different scenarios for

1. To confirm the data integrity vis a vis sequential file and Database post load (If job runs successfully).

2. If the job crashes midway,
how to check if the data which is read by datastage has actually gone inside the database ie can there be a possibility that Datastage might have read 100 records but only 99 have been actually loaded.

3. What should be the restart/ recover mechanism on crash, so that we do not have rerun the whole batch process.

It is a 8-10 hours of batch window and 200 gig of sequential file load.

I look forward to have your expertise comments.
Thanks
MK
MK
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Not sure about the way Oracle technically works. But in similar scenarios with Db2 that we use.

1. Not sure about confirming the integrity but as far as I know, after a load (in Db2), it points that the integrity check on the table is pending.

2. To tackle this scenario, before loading, we do a lookup on the same table (which is being loaded), to see if the record already exists or not. If not then we load else we leave it out.

3. Ans 2 covers this.

Hope this helps.
Sreedhar
Participant
Posts: 187
Joined: Mon Oct 30, 2006 12:16 am

Post by Sreedhar »

The best way would be

DataBase
|

ChangeCapture ---------------> NewDataSet
^
|
DataSet


DataBase would be the one where we are tyring to load the data.

Dataset is the file which you are tyring to load.

New dataset will have the values which are present on the DataSet but not loaded into the DataBase

so just make use of the NewDataset to load it again so that only the once which have failed loading in the initial stage will be loaded in the next attempt by this you can avoid the duplicate insertion i.e.
if in the initial case, out of 100, only 90, have been loaded then... in the next attempt the 90 will be discarded, only the 10 will be loaded.

hope this help's you.
Regards,
Shree
785-816-0728
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Thanks Buddies,
Your replies are informative and in line with our thoughts.

Does Datastage gives any sort of confirmation in log or in any other way of successful load
or
is there any property that confirms that Load is successful
without going to Database for rechecking.

Since, in our case data from one sequential file is splitted and loaded to 15 different tables and
out of those 15, one table may have more than 100 mills records by the end of successful batch.
We do not want to go back to DB for confirmation.

Any thoughts !!!

Cheers,
MK
MK
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Munish wrote:Thanks Buddies,
Does Datastage gives any sort of confirmation in log or in any other way of successful load
or
is there any property that confirms that Load is successful
without going to Database for rechecking.
I would assume that the Job log gives you the information regarding the records loaded, when the Oracle Enterprise stage is used.

The Bird
Post Reply