Each input record is considered to be a separate logical transaction. There is no requirement to group records together into a logical transaction. However, to optimize the performance of the Data stage jobs, there might not be a physical database commit after each record. Multiple records may be grouped together in a physical transaction. If this is the case, then there must be a checkpoint/restart mechanism to ensure that all records of a group can be reprocessed in the event of a failure. How to apply this checkpoint mechanism? Are checkponits created by default in Data stage or is it a manual procedure? Is it meant to be done in the sequencer or in the job itself?
All the jobs are meant to be run in parallel using a sequencer.
Checkpoint mechanism?
Moderators: chulett, rschirm, roy
At that level, this is something you have to design into your individual jobs. This typically involves a 'breadcrumb' system - something that lets you know how far you got into the processing and a mechanism to skip back to that point if the last run did not process 100% of the input.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
As well as keeping track of how far you got, you also need to decide your restart strategy. If you want to restart from the beginning then you need to be able to identify the rows committed to the database on the previous run. If you want to be able to re-start from a "known clean point" then you have to design for this - what does "clean" mean in your context?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, since I'm not a premium member I'm not able to view your comments.
I'm using an MQ as source, that does a destructive read for all records. I'm copying all the incoming records in a dataset as a backup.
Later the job is routed to different links based on a particular column value. After going through the validations, finally records from all the links are fed into corresponding TPUMP stages which load the records into Teradata. Now if there is a failure during commit or for any other reason, how will I handle checkpoint mechanism here? The backup dataset would contain all the input records but since the process is continous, the size would keep on increasing as well as that of the target. Hence a lookup stage would hamper the performance to quite an extent. Is there a better performance oriented solution to handle checkpoints within the job?
I'm using an MQ as source, that does a destructive read for all records. I'm copying all the incoming records in a dataset as a backup.
Later the job is routed to different links based on a particular column value. After going through the validations, finally records from all the links are fed into corresponding TPUMP stages which load the records into Teradata. Now if there is a failure during commit or for any other reason, how will I handle checkpoint mechanism here? The backup dataset would contain all the input records but since the process is continous, the size would keep on increasing as well as that of the target. Hence a lookup stage would hamper the performance to quite an extent. Is there a better performance oriented solution to handle checkpoints within the job?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
For less than 30c per day you CAN be a premium member, contribute to the bandwidth costs that have to be paid to keep DSXchange alive, and benefit professionally by being to read all the premium posters' posts (plus some other soon-to-be-announced goodies).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.