dataset read problem
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
dataset read problem
Hi all,
I have been assigned to tune some parallel jobs.
I observed in some jobs reading from dataset is taking huge time.
Pls advice me to increase no. of rows/sec while reading from dataset.
thanks and regards
Avik Dasgupta
I have been assigned to tune some parallel jobs.
I observed in some jobs reading from dataset is taking huge time.
Pls advice me to increase no. of rows/sec while reading from dataset.
thanks and regards
Avik Dasgupta
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
Re: dataset read problem
Also look in config file and check for the filesystem and mounts for the directories mentioned in for scratch and resource!
Then we can talk
Then we can talk
adasgupta123 wrote:Hi all,
I have been assigned to tune some parallel jobs.
I observed in some jobs reading from dataset is taking huge time.
Pls advice me to increase no. of rows/sec while reading from dataset.
thanks and regards
Avik Dasgupta
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
Datastage read problem
I am very new to datastage.I developed some parallel jobs in last tworay.wurlod wrote:What are your parallel job tuning credentials? That is, why did they give you the task? How much experience do you have in this area?
months.
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
Its not about usage! try and find the mount.. Also when you say 8 node are all the 8 nodes used well and is the dataset data well distributed (check out source for this).
adasgupta123 wrote:Hi,balajisr wrote:How many rows do you have in each partition?
What is your partition count?
Post your job design.You need to give more details.
The partition count is 8 and i have checked the filesystem,the memory usage is ok.
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
Its not about usage! try and find the mount.. Also when you say 8 node are all the 8 nodes used well and is the dataset data well distributed (check out source for this).
adasgupta123 wrote:Hi,balajisr wrote:How many rows do you have in each partition?
What is your partition count?
Post your job design.You need to give more details.
The partition count is 8 and i have checked the filesystem,the memory usage is ok.
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
Hi ,
I have checked the mount points.Data is well distributed accros all the
8 nodes.One thing i wish to inform that run time column propagation option is enabled.Is it delaying the read process?
I have checked the mount points.Data is well distributed accros all the
8 nodes.One thing i wish to inform that run time column propagation option is enabled.Is it delaying the read process?
tagnihotri wrote:Its not about usage! try and find the mount.. Also when you say 8 node are all the 8 nodes used well and is the dataset data well distributed (check out source for this).
adasgupta123 wrote:Hi,balajisr wrote:How many rows do you have in each partition?
What is your partition count?
Post your job design.You need to give more details.
The partition count is 8 and i have checked the filesystem,the memory usage is ok.
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
RCP should not effect the performance. If data is well distributed and file mount are proper (i.e. individual filesystem mount for nodes) then are you sure that the issue is while reading dataset!
The performance issue may be because of some other processing you are doing in your job. How exactly have you blamed dataset read, can you elaborate please
The performance issue may be because of some other processing you are doing in your job. How exactly have you blamed dataset read, can you elaborate please
adasgupta123 wrote:Hi ,
I have checked the mount points.Data is well distributed accros all the
8 nodes.One thing i wish to inform that run time column propagation option is enabled.Is it delaying the read process?
tagnihotri wrote:Its not about usage! try and find the mount.. Also when you say 8 node are all the 8 nodes used well and is the dataset data well distributed (check out source for this).
adasgupta123 wrote: Hi,
The partition count is 8 and i have checked the filesystem,the memory usage is ok.
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
Basically we are handling huge amont of data every day(around 300GB!)
and it is getting larger and lager every month.
In most of the jobs the dataset is the first stage and final o/p stage i.e
the output dataset of one job is acting as a input to the next job.
In the jobs there are mainly join and transformation stages.In some
cases there are funnel,filter stages.
I am guessing dataset read problem because in all other stages out put
links the no. o rows per second is much higher than in the case of dataset.
and it is getting larger and lager every month.
In most of the jobs the dataset is the first stage and final o/p stage i.e
the output dataset of one job is acting as a input to the next job.
In the jobs there are mainly join and transformation stages.In some
cases there are funnel,filter stages.
I am guessing dataset read problem because in all other stages out put
links the no. o rows per second is much higher than in the case of dataset.
tagnihotri wrote:RCP should not effect the performance. If data is well distributed and file mount are proper (i.e. individual filesystem mount for nodes) then are you sure that the issue is while reading dataset!
The performance issue may be because of some other processing you are doing in your job. How exactly have you blamed dataset read, can you elaborate please
adasgupta123 wrote:Hi ,
I have checked the mount points.Data is well distributed accros all the
8 nodes.One thing i wish to inform that run time column propagation option is enabled.Is it delaying the read process?
tagnihotri wrote:Its not about usage! try and find the mount.. Also when you say 8 node are all the 8 nodes used well and is the dataset data well distributed (check out source for this).
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Etiquette Note
It is not necessary to overquote all previous replies - they're there in the thread. Also, using Quote severely restricts your ability to earn points.
It is not necessary to overquote all previous replies - they're there in the thread. Also, using Quote severely restricts your ability to earn points.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Rows/sec is an almost completely meaningless metric. Various factors influence it, usually negatively, such as row width, network bottlenecks, the clock still running after all rows have been processed, and so on. I have posted before on this. There can be no such thing as an answer to the question "what is a typical rows/sec?". The main way to increase the read rate from a Data Set is to increase buffer sizes and not to have any slower stage types downstream of it. But sometimes you just have to. All else being equal, minimize the time taken by ensuring that rows are distributed equally across all partitions when the Data Set is populated.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.