Interview question could not answer,kindly guide

tvikas · Post by **tvikas** » Thu Aug 09, 2012 12:51 pm

Hi,
If there are billions of records in job and the job fails in the middle of execution for some runtime reason.We need to reset and run the job again.what happens in this situation.

Most data was already moved to target.Does it happen like the data in target is all deleted and will start from again.or will start from previous checkpoint. How will the job identify previous job checkpoint\transaction commit record location(in target)

Its been like career break for me, i forgot many real time situations kindly give a clear picture on this situation.Will be helpful.thanks in advance.

bye-

chulett · Post by **chulett** » Thu Aug 09, 2012 12:57 pm

Popular question, it would seem.

Someone else has the same dilemma here.

kduke · Post by **kduke** » Sat Aug 11, 2012 6:48 pm

Try it with a few rows and tell us. The rule is we don't answer interview questions. We want people to figure these questions out for themselves.

tvikas · Post by **tvikas** » Sat Aug 11, 2012 7:09 pm

kduke wrote:Try it with a few rows and tell us. The rule is we don't answer interview questions. We want people to figure these questions out for themselves.

I wish I had software to execute.Deadlock,to run job I need to get selected in interview. But thanks for your response

PaulVL · Post by **PaulVL** » Sun Aug 12, 2012 9:02 am

You don't need a system to answer the question.

Go from memory. What did the jobs do in the past when they failed (for whatever reason)?

What was the behavior that you noticed when you started the job again?

tvikas · Post by **tvikas** » Mon Aug 13, 2012 11:22 am

Thanks for sharing your response.Usually when the job fails,if it has runtime error,I usually first try to reset the job in director.then restart it.but I am consern to understand situations where part of data is already loaded in target (suppose database).do we need to delete those external,or datastage will do it internally ?if so how it does?

PaulVL · Post by **PaulVL** » Mon Aug 13, 2012 12:26 pm

Not going to give you the answer (see KDUKE above).

But think about how DataStage handles the data on a restart.
Think about your target system (DBMS) or other repository (FILE, XML, MQ, special build op, etc...). How do you think IT deals with what DataStage does during a restart?

If you know how DataStage works, and you know how your Target System reacts... maybe you can craft code to overcome certain error scenarios?

tvikas · Post by **tvikas** » Thu Aug 16, 2012 9:26 am

Usually in database stage there is transaction commit text box kind of thing where we can mention the transaction should commit for example for every 3000 records.this count if kept in lower interger value,will reduce the performance of job.

I think there is also an environment variable for it.

My understanding till a point.

Now dbms stages,like db2 the transaction for example job fails at 6040 records in that job the transaction commit is for every 3000 records. So 6000 records are committed and when we restart job it will reevaluate from 6001 th record,40 records after that are actually rollback,not saved in db.

Usually with partitioning and parallel pipeline,how this complexity is looked into in 2 nd run is what I need to recollect,bcos in the second run from source what data will go,...I'm not able to ..wish me good luck.