Hello All,
A clarification in designing jobs.
1.I am having a job which does the Extraction from a source table and does some transformations(separates the new records and updates) and does an insert(seprate link)and update(separate link) into the same table .
My doubt is can i do insert and update in the same job or separate the insert job and update jobs.Which is ideal approach?.
I am following this approach
Extract the data from table and dump it onto a dataset.
Read the dataset and do Transformations and load it onto another dataset.
Load the new inserts seprately.
Load the updates separtely.
Problem:-With this approach I am increasing the number of jobs.
What happens if for high data volumes or low data volumes.
Any inputs would be most appreciated.
Regards,
Bala.
Desiging Jobs
Moderators: chulett, rschirm, roy
More number of jobs (modularity) will give you ease of debugging, along with restartability. A single huge job will make debugging a nightmare. Forget about restartability. Weigh your options.
You can probably create a single job to extract, transform and load to staging datasets. Two more jobs, one for insert and the other for updates.
You can probably create a single job to extract, transform and load to staging datasets. Two more jobs, one for insert and the other for updates.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Hi All,
Thanks for the inputs.So i guess I am going on with the right approach
Extract the data from table and dump it onto a dataset.
Read the dataset and do Transformations and load it onto another dataset.
Load the new inserts seprately.
Load the updates separtely.
Another doubt.How can i decide on the size of the project directory.If i am creating many datasets as intermediate target say roughly each dataset is of size(50MB) is there any propotional formula or depending on the number and size i can determine the project directory size apart from the space for installables.
Regards,
Bala.
Thanks for the inputs.So i guess I am going on with the right approach
Extract the data from table and dump it onto a dataset.
Read the dataset and do Transformations and load it onto another dataset.
Load the new inserts seprately.
Load the updates separtely.
Another doubt.How can i decide on the size of the project directory.If i am creating many datasets as intermediate target say roughly each dataset is of size(50MB) is there any propotional formula or depending on the number and size i can determine the project directory size apart from the space for installables.
Regards,
Bala.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In a word, "MORE".
Data Sets, particularly with unbounded strings, consume rather more disk space than you would expect. There is an 80-bit-per-record storage overhead at the record level that also needs to be considered. The Parallel Job Developer's Guide (page 2-32) helps you to calculate the storage requirements for each data type.
In addition to space on your resource disk, where Data Set data files reside, you also need to configure lots of space on scratch disk. How much is really a function of what kind of processing you are doing and how much physical memory can be allocated to those processes - any extra spills to scratch disk.
Data Sets, particularly with unbounded strings, consume rather more disk space than you would expect. There is an 80-bit-per-record storage overhead at the record level that also needs to be considered. The Parallel Job Developer's Guide (page 2-32) helps you to calculate the storage requirements for each data type.
In addition to space on your resource disk, where Data Set data files reside, you also need to configure lots of space on scratch disk. How much is really a function of what kind of processing you are doing and how much physical memory can be allocated to those processes - any extra spills to scratch disk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
Thanks for the response.
Loading the data directly to the database from a dataset vs loading onto the copy stage and then to database any performance issues.
My business requirement has future enhancements so i am keeping a copy stage.Kindly throw your inputs on this and also on my designing approach.
Regards,
Bala.
Thanks for the response.
Loading the data directly to the database from a dataset vs loading onto the copy stage and then to database any performance issues.
My business requirement has future enhancements so i am keeping a copy stage.Kindly throw your inputs on this and also on my designing approach.
Regards,
Bala.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: