DSXchange

Posted: **Tue May 22, 2012 4:42 am**

Hi,

I am running Datastage job which has huge number of input records and i don't want to truncate and load the target if the job fails. Means i want to start the job where it got aborted.

Is there any way to design it?

Posted: **Tue May 22, 2012 6:18 am**

Why does every question you post look like you are fishing for interview answers?

Posted: **Tue May 22, 2012 8:32 am**

That is not the answer to my question anyway

Just kidding

If the person is not working in any technology but wants to learn that then Interview questions are the best source to get more knowledge i believe.

Posted: **Tue May 22, 2012 4:36 pm**

Then the answer to your question is "Yes".

Posted: **Tue May 29, 2012 2:24 am**

May i know the way to design it??

Posted: **Tue May 29, 2012 3:30 am**

I am not sure what you are asking whether it is Job sequence or parallel Job?
1.If it is Job sequence it can be done using check point
2.If it is parallel Job you need the identify the records loaded in target by means by Joining condition and load the remaining records

Posted: **Tue May 29, 2012 3:37 am**

You need to keep track of the rows that have been committed, and/or query the target table at the beginning of your restart run, ideally with every row identified with the run number that most recently changed the row. If you're deleting rows, even more tracking (keeping before-image) will be necessary.

Posted: **Tue May 29, 2012 8:18 am**

1. sort the data over primary keys and then have a column in the table to record the Update date
2. Fetch max(Primary key) from table having max(update date)
3. Join with the the source data over a dummy column (cartesian product)
4. Use a filter to pass only those rows from source which are greater than the primary key fetched in step 2

Typical interview question asked to me in 90% of the interviews i attended. :D

Posted: **Thu Jun 21, 2012 11:22 am**

some one tested me with the same question and I gave my mentioned views but the opposite person had a different solution. He made reference to a term called "serialisation" where data entries are consistently made into memory. Not sure what he was talking about, never came across such thing in the documents.

Any expert view on the above ?

Posted: **Thu Jun 21, 2012 4:40 pm**

Serialisation would go hand in hand with what you described. Sorting by primary key would not guarantee you insert by primary key unless you have serialised the input prior to sorting (and loading).

If you were going to insert millions of rows, I would process records to a sequential file/data set ready for load, that way, I would only begin my insert if all processing was successful. Then either bulk load (Oracle provided a way to restart a failed bulk load, not sure about others but surely they must do) or if insert actually proves faster, serialise and have a mechanism to determine how many rows were successfully loaded (audit field), skipping those rows on restart

Posted: **Thu Jun 21, 2012 7:56 pm**

I like the write output to a dataset in append mode if this is absolutly necessary.

1st job in sequence creates a dataset

2nd job

left outer join input against the dataset to flag what has been processed, continue to append to the dataset in the job (so if it fails multiple times it will continue to grow) but first run is against an empty dataset so is quick.

3rd job to output to table from dataset

4th job to remove dataset

Posted: **Fri Jun 22, 2012 4:56 am**

did any one come across "serialization" in datastage ?

Does any database stage have serialization facility ?

I was surprised whe he spoke regarding serialization over just sequential files . (File to file process)

I am running on 8.1 version not sure if this has been added in the later versions.

DSXchange

Restart the job where it aborts

Restart the job where it aborts