Page 1 of 2

Dataset Read is slow

Posted: Tue Mar 04, 2014 12:52 am
by Ravi.K
Hi,

We have a job,

Source (Dataset) --> CopyStage --> Database (Oracle Connector)

Dataset contains 500 columns and 200,000 records. The reading from dataset is very slow so that is the reason i have added APT_DEFAULT_TRANSPORT_BLOCK_SIZE and APT_MAXIMUM_TRANSPORT_BLOCK_SIZE parameter to increase performance. Now dataset read takes around 8 to 10 min...

Please advise me any suggestions. Thanks...

Posted: Tue Mar 04, 2014 5:46 am
by thompsonp
How big is the dataset (have a look in dataset management)?
What is the schema of the dataset i.e. column definitions, and are the columns bounded with declared sizes?
Which version are you using 8.?
How many partitions is the dataset created on and how many does your job run with that reads the dataset.
How did you determine that it was the read of the dataset was slow, rather than the load to the database?

Posted: Tue Mar 04, 2014 8:14 am
by chulett
Never mind the fact that it looks like you already 'solved' your problem. What kind of suggestions are you looking for?

Posted: Tue Mar 04, 2014 7:54 pm
by ray.wurlod
Try it with this job design.

Source (Dataset) --> CopyStage

Assert the Force property in the Copy stage.

I think you'll find that the Data Set is not the culprit here.

Posted: Wed Mar 05, 2014 4:24 pm
by Ravi.K
Datatypes are bounded length and a maximum of 150 characters.
Version is: Datastage 8.5
Both Jobs run on 4 nodes
Here is data management stats...

Total Records: 176618
Total 32K Blocks: 7360
Total Bytes: 942998906

Posted: Wed Mar 05, 2014 4:26 pm
by asorrell
Another way to verify it isn't the "read" that is slow. Replace the Oracle stage with a Peek stage. Then the RPS will show the maximum rate the dataset could possibly be read.

Posted: Wed Mar 05, 2014 4:40 pm
by Ravi.K
chulett wrote:What kind of suggestions are you looking for?
Data volume will increase day by day, Is there any optimized ways when dealing with bigger dataset. Also I am going to deal with many bigger datasets soon.

Thanks in Advance...

Posted: Wed Mar 05, 2014 4:42 pm
by Ravi.K
Ray, it is issue with dataset only I tried as you adviced...

Posted: Wed Mar 05, 2014 4:47 pm
by Ravi.K
Andy, I have followed already Ray steps and its confirmed that it is issue with dataset.

Dataset size : around 1 GB

Reply

Posted: Wed Mar 05, 2014 5:09 pm
by ssnegi
I would suggest that instead of generating 1GB dataset, divide it into smaller datasets. Then read them together and funnel them. This would speed up the process.

Re: Reply

Posted: Thu Mar 06, 2014 9:05 am
by Ravi.K
We need some how data should be single dataset to prepare hierarchives and pass it to further level...

Re: Reply

Posted: Thu Mar 06, 2014 4:35 pm
by ssnegi
It is only dividing the data into smaller datasets at the time of generation. what you can do with a large dataset can be done with smaller datasets by funneling them into one result before using them anywhere (hierarchiving etc). You can divide the smaller datasets based on any condition and you can use sequence funnel option to keep them in proper order if need be.
you can also go with the option of using temporary tables instead of datasets if the datasize is too big. You can drop these tables after the processing has completed in the afterSQL statement.

Posted: Thu Mar 06, 2014 9:30 pm
by PaulVL
How many partitions do you have, and is the data evenly distributed between them?

Re: Reply

Posted: Fri Mar 07, 2014 4:01 am
by Ravi.K
ssnegi, I will try to see the performance of the table, I bit concern about IO operations between APP server and DB server to carry these big records....

Posted: Fri Mar 07, 2014 5:48 am
by Ravi.K
Paul, it runs on 4 nodes and records are equally distributed. :)