Desiging Jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Desiging Jobs

Post by bala_135 »

Hello All,

A clarification in designing jobs.

1.I am having a job which does the Extraction from a source table and does some transformations(separates the new records and updates) and does an insert(seprate link)and update(separate link) into the same table .

My doubt is can i do insert and update in the same job or separate the insert job and update jobs.Which is ideal approach?.

I am following this approach
Extract the data from table and dump it onto a dataset.
Read the dataset and do Transformations and load it onto another dataset.
Load the new inserts seprately.
Load the updates separtely.

Problem:-With this approach I am increasing the number of jobs.
What happens if for high data volumes or low data volumes.

Any inputs would be most appreciated.

Regards,
Bala.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

More number of jobs (modularity) will give you ease of debugging, along with restartability. A single huge job will make debugging a nightmare. Forget about restartability. Weigh your options.
You can probably create a single job to extract, transform and load to staging datasets. Two more jobs, one for insert and the other for updates.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

DSguru2B wrote: You can probably create a single job to extract, transform and load to staging datasets
I would split these jobs as well. This way, I can have all my extraction jobs to use the little window time that I may have to source the rows, and then free the source databases.
gateleys
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Another good point by gateleys. Modularization has lots of benefits as opposed to its couterpart design.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Hi All,

Thanks for the inputs.So i guess I am going on with the right approach

Extract the data from table and dump it onto a dataset.
Read the dataset and do Transformations and load it onto another dataset.
Load the new inserts seprately.
Load the updates separtely.


Another doubt.How can i decide on the size of the project directory.If i am creating many datasets as intermediate target say roughly each dataset is of size(50MB) is there any propotional formula or depending on the number and size i can determine the project directory size apart from the space for installables.

Regards,
Bala.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In a word, "MORE".

Data Sets, particularly with unbounded strings, consume rather more disk space than you would expect. There is an 80-bit-per-record storage overhead at the record level that also needs to be considered. The Parallel Job Developer's Guide (page 2-32) helps you to calculate the storage requirements for each data type.

In addition to space on your resource disk, where Data Set data files reside, you also need to configure lots of space on scratch disk. How much is really a function of what kind of processing you are doing and how much physical memory can be allocated to those processes - any extra spills to scratch disk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bala_135
Premium Member
Premium Member
Posts: 156
Joined: Fri Oct 28, 2005 1:00 am
Location: Melbourne,Australia

Post by bala_135 »

Hi,

Thanks for the response.
Loading the data directly to the database from a dataset vs loading onto the copy stage and then to database any performance issues.
My business requirement has future enhancements so i am keeping a copy stage.Kindly throw your inputs on this and also on my designing approach.

Regards,
Bala.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A Copy stage that does nothing will be optimized out. You won't see it in the score.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply