Dataset Read is slow
Moderators: chulett, rschirm, roy
Dataset Read is slow
Hi,
We have a job,
Source (Dataset) --> CopyStage --> Database (Oracle Connector)
Dataset contains 500 columns and 200,000 records. The reading from dataset is very slow so that is the reason i have added APT_DEFAULT_TRANSPORT_BLOCK_SIZE and APT_MAXIMUM_TRANSPORT_BLOCK_SIZE parameter to increase performance. Now dataset read takes around 8 to 10 min...
Please advise me any suggestions. Thanks...
We have a job,
Source (Dataset) --> CopyStage --> Database (Oracle Connector)
Dataset contains 500 columns and 200,000 records. The reading from dataset is very slow so that is the reason i have added APT_DEFAULT_TRANSPORT_BLOCK_SIZE and APT_MAXIMUM_TRANSPORT_BLOCK_SIZE parameter to increase performance. Now dataset read takes around 8 to 10 min...
Please advise me any suggestions. Thanks...
Cheers
Ravi K
Ravi K
How big is the dataset (have a look in dataset management)?
What is the schema of the dataset i.e. column definitions, and are the columns bounded with declared sizes?
Which version are you using 8.?
How many partitions is the dataset created on and how many does your job run with that reads the dataset.
How did you determine that it was the read of the dataset was slow, rather than the load to the database?
What is the schema of the dataset i.e. column definitions, and are the columns bounded with declared sizes?
Which version are you using 8.?
How many partitions is the dataset created on and how many does your job run with that reads the dataset.
How did you determine that it was the read of the dataset was slow, rather than the load to the database?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Try it with this job design.
Source (Dataset) --> CopyStage
Assert the Force property in the Copy stage.
I think you'll find that the Data Set is not the culprit here.
Source (Dataset) --> CopyStage
Assert the Force property in the Copy stage.
I think you'll find that the Data Set is not the culprit here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Reply
It is only dividing the data into smaller datasets at the time of generation. what you can do with a large dataset can be done with smaller datasets by funneling them into one result before using them anywhere (hierarchiving etc). You can divide the smaller datasets based on any condition and you can use sequence funnel option to keep them in proper order if need be.
you can also go with the option of using temporary tables instead of datasets if the datasize is too big. You can drop these tables after the processing has completed in the afterSQL statement.
you can also go with the option of using temporary tables instead of datasets if the datasize is too big. You can drop these tables after the processing has completed in the afterSQL statement.