Page 1 of 1

Posted: Mon Mar 28, 2011 12:46 pm
by chulett
Restartability of an ETL job can also mean it picks up where it left off. And it's a workflow in Informatica, not a workload. The other difference is a workflow is required to run a single mapping, a sequence job isn't.

Posted: Mon Mar 28, 2011 1:35 pm
by olgc
Yes, it's workflow in Informatica, thanks for correcting.

Does "it picks up where it left off" mean the same as "it restarts from the failed point"? It's only a part of control job restartability, another part is "it restarts from the designated point". This one is harder to implement than the failed point. Both are only applied to control job, but not E.T.L. job.

Thanks,

Posted: Mon Mar 28, 2011 3:02 pm
by chulett
My "pick up where it left off" comment was specifically directed to ETL jobs, not at the job control level. It may not be typical but it can certainly be done.

Posted: Tue Mar 29, 2011 6:44 am
by olgc
chulett wrote:My "pick up where it left off" comment was specifically directed to ETL jobs, not at the job control level. It may not be typical but it can certainly be done. ...
That's interesting, very interesting. Let's look at an example for me to understand how you implement '"pick up where it left off" ETL jobs. If I understand right: it's about concrete ETL job. If a loading job is failed at the loading 123rd records and the transaction size is 50, can you show me how you "pick up" which record and continue the job, and finish loading with the rest of records. Let's say the entire load conatins 100,134 records.

Thanks,

Posted: Tue Mar 29, 2011 6:58 am
by chulett
High level... first you need a static source. After that it is a matter of marking your progress in the job, typically at each commit point, so you know the last successful one. That 'marker' row count gets set to zero at the end of a successful run. Each time the job runs, the marker is passed in as a parameter and that number of rows are read but constrained / filtered from passing to the output.

Multi-node PX jobs severely complicate this, as you could imagine.

Posted: Tue Mar 29, 2011 9:15 am
by olgc
Okay, that sounds complicated. Absolutely, designing restartable ETL job is a very sophisticated and difficult issue. It's worth an entire chapter of a book to address it, if not a book dedicated to it. Here is an article on it: www.uiis.net/etl/index.php. Any comment and feedback is appreciated.

Thanks,

Posted: Wed Mar 30, 2011 2:03 pm
by ray.wurlod
I disagree with the assertion about "most" important. I believe that prevention is better than cure.

Posted: Wed Mar 30, 2011 5:58 pm
by vmcburney
You can get restartability in a DataStage job against a dynamic, not static, source if you combine DataStage with InfoSphere CDC. The CDC bookmark functions let you compare a source table to a target table to keep them in synch and DataStage can be the engine for transforming and writing the data. This takes care of the complications of the DataStage parallel engine. This boosts CDC as CDC can be slow in synching a table initially or for a large volume so it makes CDC more scalable, it boosts DataStage by providing the restart and delta capabilities.

Posted: Thu Mar 31, 2011 7:01 am
by olgc
Very good point, vmcburney, I like this, I'll add it to the article for an approach of restartable ETL job, many thanks. But CDC is only used to handle slow change dimension table. If it's used for other tables, such as fact tables, the performance could be unbearable, unless your fact table is small. And for others, maybe CDC is too pricey.
Please check www.uiis.net/etl/index.php for Design Restartable ETL jobs

Posted: Thu Mar 31, 2011 7:24 am
by olgc
Thanks, ray.wurlod. Do we talk the same thing here? I have a gut feeling we don't.

Posted: Thu Mar 31, 2011 3:31 pm
by ray.wurlod
Probably not. I'm talking about eliminating the need for restartability within jobs.

Posted: Fri Apr 01, 2011 2:00 pm
by ray.wurlod
Avoid timeout errors by controlling the number (actually the workload) of jobs that can be running simultanously, having heed of other workload on the machine.

Avoid locking errors by good design.

I agree network down or database down look hard but they're easily handled before a job starts (a small job to "test the connection" before the main job starts). Losing power/network/database while the job is running is handled by usual high availability techniques such as uninterruptable power supplies, redundant components, and so on.

Posted: Sat Apr 02, 2011 4:58 pm
by ray.wurlod
Neither of those affected any of my Information Server installations. Even one in Tokyo (which has alternate servers in Switzerland and Australia) was able to keep going, even with some staff relocating to other cities farther west in Japan and working remotely.

And no doomsaying will affect my belief that prevention is better than cure.

Most of the sites in which I'm involved have had no unscheduled downtime in that period. We always set up communication channels with DBAs, system administrators, etc., so that we're advised about their plans for downtime. So we don't do any processing in those times.