Restartable ETL Jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Restartability of an ETL job can also mean it picks up where it left off. And it's a workflow in Informatica, not a workload. The other difference is a workflow is required to run a single mapping, a sequence job isn't.
-craig

"You can never have too many knives" -- Logan Nine Fingers
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Yes, it's workflow in Informatica, thanks for correcting.

Does "it picks up where it left off" mean the same as "it restarts from the failed point"? It's only a part of control job restartability, another part is "it restarts from the designated point". This one is harder to implement than the failed point. Both are only applied to control job, but not E.T.L. job.

Thanks,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

My "pick up where it left off" comment was specifically directed to ETL jobs, not at the job control level. It may not be typical but it can certainly be done.
-craig

"You can never have too many knives" -- Logan Nine Fingers
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

chulett wrote:My "pick up where it left off" comment was specifically directed to ETL jobs, not at the job control level. It may not be typical but it can certainly be done. ...
That's interesting, very interesting. Let's look at an example for me to understand how you implement '"pick up where it left off" ETL jobs. If I understand right: it's about concrete ETL job. If a loading job is failed at the loading 123rd records and the transaction size is 50, can you show me how you "pick up" which record and continue the job, and finish loading with the rest of records. Let's say the entire load conatins 100,134 records.

Thanks,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

High level... first you need a static source. After that it is a matter of marking your progress in the job, typically at each commit point, so you know the last successful one. That 'marker' row count gets set to zero at the end of a successful run. Each time the job runs, the marker is passed in as a parameter and that number of rows are read but constrained / filtered from passing to the output.

Multi-node PX jobs severely complicate this, as you could imagine.
-craig

"You can never have too many knives" -- Logan Nine Fingers
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Okay, that sounds complicated. Absolutely, designing restartable ETL job is a very sophisticated and difficult issue. It's worth an entire chapter of a book to address it, if not a book dedicated to it. Here is an article on it: www.uiis.net/etl/index.php. Any comment and feedback is appreciated.

Thanks,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I disagree with the assertion about "most" important. I believe that prevention is better than cure.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You can get restartability in a DataStage job against a dynamic, not static, source if you combine DataStage with InfoSphere CDC. The CDC bookmark functions let you compare a source table to a target table to keep them in synch and DataStage can be the engine for transforming and writing the data. This takes care of the complications of the DataStage parallel engine. This boosts CDC as CDC can be slow in synching a table initially or for a large volume so it makes CDC more scalable, it boosts DataStage by providing the restart and delta capabilities.
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Very good point, vmcburney, I like this, I'll add it to the article for an approach of restartable ETL job, many thanks. But CDC is only used to handle slow change dimension table. If it's used for other tables, such as fact tables, the performance could be unbearable, unless your fact table is small. And for others, maybe CDC is too pricey.
Please check www.uiis.net/etl/index.php for Design Restartable ETL jobs
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Thanks, ray.wurlod. Do we talk the same thing here? I have a gut feeling we don't.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Probably not. I'm talking about eliminating the need for restartability within jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Avoid timeout errors by controlling the number (actually the workload) of jobs that can be running simultanously, having heed of other workload on the machine.

Avoid locking errors by good design.

I agree network down or database down look hard but they're easily handled before a job starts (a small job to "test the connection" before the main job starts). Losing power/network/database while the job is running is handled by usual high availability techniques such as uninterruptable power supplies, redundant components, and so on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Neither of those affected any of my Information Server installations. Even one in Tokyo (which has alternate servers in Switzerland and Australia) was able to keep going, even with some staff relocating to other cities farther west in Japan and working remotely.

And no doomsaying will affect my belief that prevention is better than cure.

Most of the sites in which I'm involved have had no unscheduled downtime in that period. We always set up communication channels with DBAs, system administrators, etc., so that we're advised about their plans for downtime. So we don't do any processing in those times.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply