Dataset Read is slow

Ravi.K · Post by **Ravi.K** » Tue Mar 04, 2014 12:52 am

Hi,

We have a job,

Source (Dataset) --> CopyStage --> Database (Oracle Connector)

Dataset contains 500 columns and 200,000 records. The reading from dataset is very slow so that is the reason i have added APT_DEFAULT_TRANSPORT_BLOCK_SIZE and APT_MAXIMUM_TRANSPORT_BLOCK_SIZE parameter to increase performance. Now dataset read takes around 8 to 10 min...

Please advise me any suggestions. Thanks...

thompsonp · Post by **thompsonp** » Tue Mar 04, 2014 5:46 am

How big is the dataset (have a look in dataset management)?
What is the schema of the dataset i.e. column definitions, and are the columns bounded with declared sizes?
Which version are you using 8.?
How many partitions is the dataset created on and how many does your job run with that reads the dataset.
How did you determine that it was the read of the dataset was slow, rather than the load to the database?

chulett · Post by **chulett** » Tue Mar 04, 2014 8:14 am

Never mind the fact that it looks like you already 'solved' your problem. What kind of suggestions are you looking for?

ray.wurlod · Post by **ray.wurlod** » Tue Mar 04, 2014 7:54 pm

Try it with this job design.

Source (Dataset) --> CopyStage

Assert the Force property in the Copy stage.

I think you'll find that the Data Set is not the culprit here.

Ravi.K · Post by **Ravi.K** » Wed Mar 05, 2014 4:24 pm

Datatypes are bounded length and a maximum of 150 characters.
Version is: Datastage 8.5
Both Jobs run on 4 nodes
Here is data management stats...

Total Records: 176618
Total 32K Blocks: 7360
Total Bytes: 942998906

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Mar 05, 2014 4:26 pm

Another way to verify it isn't the "read" that is slow. Replace the Oracle stage with a Peek stage. Then the RPS will show the maximum rate the dataset could possibly be read.

Ravi.K · Post by **Ravi.K** » Wed Mar 05, 2014 4:40 pm

chulett wrote:What kind of suggestions are you looking for?

Data volume will increase day by day, Is there any optimized ways when dealing with bigger dataset. Also I am going to deal with many bigger datasets soon.

Thanks in Advance...

Ravi.K · Post by **Ravi.K** » Wed Mar 05, 2014 4:42 pm

Ray, it is issue with dataset only I tried as you adviced...

Ravi.K · Post by **Ravi.K** » Wed Mar 05, 2014 4:47 pm

Andy, I have followed already Ray steps and its confirmed that it is issue with dataset.

Dataset size : around 1 GB

ssnegi · Post by **ssnegi** » Wed Mar 05, 2014 5:09 pm

I would suggest that instead of generating 1GB dataset, divide it into smaller datasets. Then read them together and funnel them. This would speed up the process.

Ravi.K · Post by **Ravi.K** » Thu Mar 06, 2014 9:05 am

We need some how data should be single dataset to prepare hierarchives and pass it to further level...

ssnegi · Post by **ssnegi** » Thu Mar 06, 2014 4:35 pm

It is only dividing the data into smaller datasets at the time of generation. what you can do with a large dataset can be done with smaller datasets by funneling them into one result before using them anywhere (hierarchiving etc). You can divide the smaller datasets based on any condition and you can use sequence funnel option to keep them in proper order if need be.
you can also go with the option of using temporary tables instead of datasets if the datasize is too big. You can drop these tables after the processing has completed in the afterSQL statement.

PaulVL · Post by **PaulVL** » Thu Mar 06, 2014 9:30 pm

How many partitions do you have, and is the data evenly distributed between them?

Ravi.K · Post by **Ravi.K** » Fri Mar 07, 2014 4:01 am

ssnegi, I will try to see the performance of the table, I bit concern about IO operations between APP server and DB server to carry these big records....

Ravi.K · Post by **Ravi.K** » Fri Mar 07, 2014 5:48 am

Paul, it runs on 4 nodes and records are equally distributed.

DSXchange

Dataset Read is slow

Dataset Read is slow

Reply

Re: Reply

Re: Reply

Re: Reply