Restart the job where it aborts
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 7
- Joined: Tue Jul 28, 2009 6:09 am
- Location: Chennai
Restart the job where it aborts
Hi,
I am running Datastage job which has huge number of input records and i don't want to truncate and load the target if the job fails. Means i want to start the job where it got aborted.
Is there any way to design it?
I am running Datastage job which has huge number of input records and i don't want to truncate and load the target if the job fails. Means i want to start the job where it got aborted.
Is there any way to design it?
Phani Kumar
-
- Participant
- Posts: 7
- Joined: Tue Jul 28, 2009 6:09 am
- Location: Chennai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 7
- Joined: Tue Jul 28, 2009 6:09 am
- Location: Chennai
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You need to keep track of the rows that have been committed, and/or query the target table at the beginning of your restart run, ideally with every row identified with the run number that most recently changed the row. If you're deleting rows, even more tracking (keeping before-image) will be necessary.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
1. sort the data over primary keys and then have a column in the table to record the Update date
2. Fetch max(Primary key) from table having max(update date)
3. Join with the the source data over a dummy column (cartesian product)
4. Use a filter to pass only those rows from source which are greater than the primary key fetched in step 2
Typical interview question asked to me in 90% of the interviews i attended. :D
2. Fetch max(Primary key) from table having max(update date)
3. Join with the the source data over a dummy column (cartesian product)
4. Use a filter to pass only those rows from source which are greater than the primary key fetched in step 2
Typical interview question asked to me in 90% of the interviews i attended. :D
- Zulfi
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
some one tested me with the same question and I gave my mentioned views but the opposite person had a different solution. He made reference to a term called "serialisation" where data entries are consistently made into memory. Not sure what he was talking about, never came across such thing in the documents.
Any expert view on the above ?
Any expert view on the above ?
- Zulfi
Serialisation would go hand in hand with what you described. Sorting by primary key would not guarantee you insert by primary key unless you have serialised the input prior to sorting (and loading).
If you were going to insert millions of rows, I would process records to a sequential file/data set ready for load, that way, I would only begin my insert if all processing was successful. Then either bulk load (Oracle provided a way to restart a failed bulk load, not sure about others but surely they must do) or if insert actually proves faster, serialise and have a mechanism to determine how many rows were successfully loaded (audit field), skipping those rows on restart
If you were going to insert millions of rows, I would process records to a sequential file/data set ready for load, that way, I would only begin my insert if all processing was successful. Then either bulk load (Oracle provided a way to restart a failed bulk load, not sure about others but surely they must do) or if insert actually proves faster, serialise and have a mechanism to determine how many rows were successfully loaded (audit field), skipping those rows on restart
I like the write output to a dataset in append mode if this is absolutly necessary.
1st job in sequence creates a dataset
2nd job
left outer join input against the dataset to flag what has been processed, continue to append to the dataset in the job (so if it fails multiple times it will continue to grow) but first run is against an empty dataset so is quick.
3rd job to output to table from dataset
4th job to remove dataset
1st job in sequence creates a dataset
2nd job
left outer join input against the dataset to flag what has been processed, continue to append to the dataset in the job (so if it fails multiple times it will continue to grow) but first run is against an empty dataset so is quick.
3rd job to output to table from dataset
4th job to remove dataset
Andrew
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
did any one come across "serialization" in datastage ?
Does any database stage have serialization facility ?
I was surprised whe he spoke regarding serialization over just sequential files . (File to file process)
I am running on 8.1 version not sure if this has been added in the later versions.
Does any database stage have serialization facility ?
I was surprised whe he spoke regarding serialization over just sequential files . (File to file process)
I am running on 8.1 version not sure if this has been added in the later versions.
- Zulfi