Separating Extract, Transform and Load to three or more jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Separating Extract, Transform and Load to three or more jobs

Post by olgc »

:lol: Hi guys,

There is a practice in ETL coding:

Separating Extract, Transformation and Load to three or more jobs, in order to in case of failure of some step, the previous steps can be reused. That sounds good in theory. Could any one tell where is the practice documented?

Have a nice summer day,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Training class DataStage Best Practices for one.

Freezing in an Australian winter (32C in Darwin!)
Last edited by ray.wurlod on Tue Jul 26, 2005 1:37 am, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It is well documented in the archives of this site. Do a search for failover or recovery or banding to find threads on the subject.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

HI,
May be the function of calling the jobs using scritps can be verticalized.

Like you can try segregating the Extraction through a seperate script and transfromation through separate and like wise. so that the restrat point gets bifurcated.

Provided the tracability matrix should be perfect among jobs.

regards
kumar
DaleK
Premium Member
Premium Member
Posts: 68
Joined: Fri Jun 27, 2003 8:33 am
Location: Orlando

Post by DaleK »

If I understand the problem, it isn't a Best practice that is the problem, but your tool/method to schedule and run your jobs that is the problem.

We use our Mainframe scheduling tool to run our DataStage jobs. This tool allows us to rerun jobs. I guess I just have it a little easier then some of you.

Either way it sounds like you have one heck of a mess.
Best of luck.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sounds like some more thought needs to go into the design of your control structures and restart points. It's all doable, and gracefully. But it must be designed with care.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

In order to split ETL to E.T.L, a integrated job can always be built first as ETL. When the job is tested, it can be split to 3, 4, even 20s E.T.L jobs, to make it confirmed to the practice and the maintenance and support more challenge, that's the best practice?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

We normally advocate four phases.
  • Extraction to first staging area.

    Jobs presupposed by Transformation phase (e.g. loading lookups).

    Transformation into second staging area.

    Loading from staging area into target.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply