Dataset Read is slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Dataset Read is slow

Post by Ravi.K »

Hi,

We have a job,

Source (Dataset) --> CopyStage --> Database (Oracle Connector)

Dataset contains 500 columns and 200,000 records. The reading from dataset is very slow so that is the reason i have added APT_DEFAULT_TRANSPORT_BLOCK_SIZE and APT_MAXIMUM_TRANSPORT_BLOCK_SIZE parameter to increase performance. Now dataset read takes around 8 to 10 min...

Please advise me any suggestions. Thanks...
Cheers
Ravi K
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

How big is the dataset (have a look in dataset management)?
What is the schema of the dataset i.e. column definitions, and are the columns bounded with declared sizes?
Which version are you using 8.?
How many partitions is the dataset created on and how many does your job run with that reads the dataset.
How did you determine that it was the read of the dataset was slow, rather than the load to the database?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Never mind the fact that it looks like you already 'solved' your problem. What kind of suggestions are you looking for?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try it with this job design.

Source (Dataset) --> CopyStage

Assert the Force property in the Copy stage.

I think you'll find that the Data Set is not the culprit here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

Datatypes are bounded length and a maximum of 150 characters.
Version is: Datastage 8.5
Both Jobs run on 4 nodes
Here is data management stats...

Total Records: 176618
Total 32K Blocks: 7360
Total Bytes: 942998906
Cheers
Ravi K
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Another way to verify it isn't the "read" that is slow. Replace the Oracle stage with a Peek stage. Then the RPS will show the maximum rate the dataset could possibly be read.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

chulett wrote:What kind of suggestions are you looking for?
Data volume will increase day by day, Is there any optimized ways when dealing with bigger dataset. Also I am going to deal with many bigger datasets soon.

Thanks in Advance...
Cheers
Ravi K
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

Ray, it is issue with dataset only I tried as you adviced...
Cheers
Ravi K
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

Andy, I have followed already Ray steps and its confirmed that it is issue with dataset.

Dataset size : around 1 GB
Cheers
Ravi K
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Reply

Post by ssnegi »

I would suggest that instead of generating 1GB dataset, divide it into smaller datasets. Then read them together and funnel them. This would speed up the process.
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Re: Reply

Post by Ravi.K »

We need some how data should be single dataset to prepare hierarchives and pass it to further level...
Cheers
Ravi K
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Re: Reply

Post by ssnegi »

It is only dividing the data into smaller datasets at the time of generation. what you can do with a large dataset can be done with smaller datasets by funneling them into one result before using them anywhere (hierarchiving etc). You can divide the smaller datasets based on any condition and you can use sequence funnel option to keep them in proper order if need be.
you can also go with the option of using temporary tables instead of datasets if the datasize is too big. You can drop these tables after the processing has completed in the afterSQL statement.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

How many partitions do you have, and is the data evenly distributed between them?
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Re: Reply

Post by Ravi.K »

ssnegi, I will try to see the performance of the table, I bit concern about IO operations between APP server and DB server to carry these big records....
Cheers
Ravi K
Ravi.K
Participant
Posts: 209
Joined: Sat Nov 20, 2010 11:33 pm
Location: Bangalore

Post by Ravi.K »

Paul, it runs on 4 nodes and records are equally distributed. :)
Cheers
Ravi K
Post Reply