how many tables can be loaded using single job
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
how many tables can be loaded using single job
Hello
Just need an idea on how many tables can be loaded using single job.
- Tables are independent
- Apprx. each table may receive records in thousands per run
- Only insert
In my case, I have source data which is of diff types which needs to be loaded in more than 10 tables based on type.
Just need an idea on how many tables can be loaded using single job.
- Tables are independent
- Apprx. each table may receive records in thousands per run
- Only insert
In my case, I have source data which is of diff types which needs to be loaded in more than 10 tables based on type.
-
- Premium Member
- Posts: 120
- Joined: Thu Oct 28, 2004 4:24 pm
Well as previously stated Datastage has no limit. That being said, I have never nor would I ever load x amount of tables within the same job unless it was a 0ne time load and even then it would have to be adventagious to me..
For instance we load 180 * 2 (two diff. Companys) use diff. Instance Ids. )dimensions every morning.Thats not counting all the facts but they are in a different batch. To do that I have 180 LOAD jobs. These LOAD jobs are run from a batch job. I can set a parameter to run as many of these jobs in parellel as my system will allow with its resources. Scaleability. Within this process If 1 of the load jobs fail the remaining 179 jobs will still run and then and only then will the batch fail. Then during my restart I need only run the 1 job that failed. The one thing to bear in mind is the simpler the job the better you are. From a debugging and testing stand point as well as a running standpoint. Your job will only run as fast as your slowest object.
(example, we are running 8261 job instances every night in our data warehouse build. )
For instance we load 180 * 2 (two diff. Companys) use diff. Instance Ids. )dimensions every morning.Thats not counting all the facts but they are in a different batch. To do that I have 180 LOAD jobs. These LOAD jobs are run from a batch job. I can set a parameter to run as many of these jobs in parellel as my system will allow with its resources. Scaleability. Within this process If 1 of the load jobs fail the remaining 179 jobs will still run and then and only then will the batch fail. Then during my restart I need only run the 1 job that failed. The one thing to bear in mind is the simpler the job the better you are. From a debugging and testing stand point as well as a running standpoint. Your job will only run as fast as your slowest object.
(example, we are running 8261 job instances every night in our data warehouse build. )
"Don't let the bull between you and the fence"
Thanks
Gregg J Knight
"Never Never Never Quit"
Winston Churchill
Thanks
Gregg J Knight
"Never Never Never Quit"
Winston Churchill
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
It is much better to do 10 tables in one job; and 10 tables is a very small number.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I should have asked what exactly you mean by "loading a table" before responding.
I just finished writing a job this morning which calls up 20 Database stages to execute twenty different SQL internal table copies, and without thinking responded with that job in mind - not a good answer if you are doing datastage processing in your job.
If your tables use different sources and have different processing rules then I would recommend, as does Ray, one target table per job.
If, for example, you had one large source file/table that gets processed and written to your 10 different targets you would save time if you only read your source data once, making a single job more efficient.
I just finished writing a job this morning which calls up 20 Database stages to execute twenty different SQL internal table copies, and without thinking responded with that job in mind - not a good answer if you are doing datastage processing in your job.
If your tables use different sources and have different processing rules then I would recommend, as does Ray, one target table per job.
If, for example, you had one large source file/table that gets processed and written to your 10 different targets you would save time if you only read your source data once, making a single job more efficient.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 23
- Joined: Tue May 14, 2013 6:34 am
-
- Participant
- Posts: 21
- Joined: Sat Sep 08, 2007 12:04 am
- Location: bangalore
its ideal to use ten different jobs(or single job with multiple invocations) to load 10 different table and you can trigger these 10 jobs parallely in sequence (again you can have load balancing).
This provides more flexibility and control over each table for any future modifications or tracking.
Its not a good idea to open up 10 db connections in a single job
This provides more flexibility and control over each table for any future modifications or tracking.
Its not a good idea to open up 10 db connections in a single job
In this case it would seem that your best approach is to do everything in one job. If your source table can be SELECTed based on that column you could write 10 jobs (or one generic one called 10 times with different parameters) since each job won't read the whole table.
But I would suggest doing one job with 11 connections to a database; that is well within the capabilities of even the smallest DataStage installation.
But I would suggest doing one job with 11 connections to a database; that is well within the capabilities of even the smallest DataStage installation.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Still, I would advocate separate jobs. The file can be read simultaneously and (as noted) an issue with one load won't affect or cause reloads of the other tables.
However, going to disagree on one of shiva's points:
However, going to disagree on one of shiva's points:
That would be an opinion. I for one don't see any issue with it, in general. Ultimately that depends entirely on your infrastructure, database capabilities and ETL job load.shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Why not?shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job
I have a job that joins data from 20 different sources. It runs quite happily.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
If i would be in your position, i will create a master job which will do all the transformations for all the target tables and my final targetwill be dataset files. Then have on multiple instance job with RCP to load the data.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.