Restart the job where it aborts

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhaniKonagalla
Participant
Posts: 7
Joined: Tue Jul 28, 2009 6:09 am
Location: Chennai

Restart the job where it aborts

Post by PhaniKonagalla »

Hi,

I am running Datastage job which has huge number of input records and i don't want to truncate and load the target if the job fails. Means i want to start the job where it got aborted.

Is there any way to design it?
Phani Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why does every question you post look like you are fishing for interview answers? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhaniKonagalla
Participant
Posts: 7
Joined: Tue Jul 28, 2009 6:09 am
Location: Chennai

Post by PhaniKonagalla »

That is not the answer to my question anyway :lol: Just kidding

If the person is not working in any technology but wants to learn that then Interview questions are the best source to get more knowledge i believe. :lol:
Phani Kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then the answer to your question is "Yes".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhaniKonagalla
Participant
Posts: 7
Joined: Tue Jul 28, 2009 6:09 am
Location: Chennai

Post by PhaniKonagalla »

May i know the way to design it??
Phani Kumar
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

I am not sure what you are asking whether it is Job sequence or parallel Job?
1.If it is Job sequence it can be done using check point
2.If it is parallel Job you need the identify the records loaded in target by means by Joining condition and load the remaining records
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need to keep track of the rows that have been committed, and/or query the target table at the beginning of your restart run, ideally with every row identified with the run number that most recently changed the row. If you're deleting rows, even more tracking (keeping before-image) will be necessary.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

1. sort the data over primary keys and then have a column in the table to record the Update date
2. Fetch max(Primary key) from table having max(update date)
3. Join with the the source data over a dummy column (cartesian product)
4. Use a filter to pass only those rows from source which are greater than the primary key fetched in step 2

Typical interview question asked to me in 90% of the interviews i attended. :D
- Zulfi
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

some one tested me with the same question and I gave my mentioned views but the opposite person had a different solution. He made reference to a term called "serialisation" where data entries are consistently made into memory. Not sure what he was talking about, never came across such thing in the documents.

Any expert view on the above ?
- Zulfi
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Serialisation would go hand in hand with what you described. Sorting by primary key would not guarantee you insert by primary key unless you have serialised the input prior to sorting (and loading).

If you were going to insert millions of rows, I would process records to a sequential file/data set ready for load, that way, I would only begin my insert if all processing was successful. Then either bulk load (Oracle provided a way to restart a failed bulk load, not sure about others but surely they must do) or if insert actually proves faster, serialise and have a mechanism to determine how many rows were successfully loaded (audit field), skipping those rows on restart
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

I like the write output to a dataset in append mode if this is absolutly necessary.

1st job in sequence creates a dataset

2nd job

left outer join input against the dataset to flag what has been processed, continue to append to the dataset in the job (so if it fails multiple times it will continue to grow) but first run is against an empty dataset so is quick.

3rd job to output to table from dataset

4th job to remove dataset
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

did any one come across "serialization" in datastage ?

Does any database stage have serialization facility ?

I was surprised whe he spoke regarding serialization over just sequential files . (File to file process)

I am running on 8.1 version not sure if this has been added in the later versions.
- Zulfi
Post Reply