Page 1 of 1

Performance of Sequential File Stage

Posted: Thu Jan 16, 2014 2:55 am
by IBMDS4KK
Hi,

Job Design : Sequential File --> Transformer --> Oracle Connector

Description : Source File is around 7 GB which is having 9 million records. These records has to be insert in target table.

Issue Description : Job run time is around 4 hrs which shouldn't according to business.

Hint : Job is running on one node and Records are not fixed length.

Trail and Error : In the Sequential file stage, I have tried the option 'Number of Readers per node = 2' and in the Oracle Connector stage, 'Record count = 10000 and Array Size = 20000'. In the original job, the 'Record count = 2000 and Array size = 5000'. But no use because almost same time for original job and after modification.

Please suggest something to reduce the job run time (Performance).

Re: Performance of Sequential File Stage

Posted: Thu Jan 16, 2014 7:54 am
by thompsonp
How long do "the business" think the job should take? :lol:

You can performance tune for a long time; best to have a realistic target in mind.

You should establish which part(s) of the job are causing the perceived bottleneck. Judging by the subject line you have assumed it is the Sequential File Stage.

How long does the job take to run if you remove the Oracle Connector (and subsequently the transformer)?
Is the job cpu or io bound?

What options are you using to load the data? Inserts, Upserts, bulk load?

Posted: Thu Jan 16, 2014 12:13 pm
by srinivas.g
Why you are running on One node. If possible run with 8 node and see.

Posted: Thu Jan 16, 2014 12:14 pm
by srinivas.g
Also How many columns are in your job is also matter here.

Posted: Thu Jan 16, 2014 3:59 pm
by ray.wurlod
Why don't you measure the baseline performance? How fast can your server read this volume of data without DataStage? Create a DataStage job consisting only of a Sequential File stage and a Copy stage with no output. How fast does that read the file?

Multiple readers may help. Play with that certainly.

Array size and record count are irrelevant to sequential file processing, which uses STREAMS I/O.

Posted: Thu Jan 16, 2014 4:30 pm
by chulett
Exactly. I wager 400 Quatloos on Oracle being the sole 'bottleneck' here.

Posted: Thu Jan 16, 2014 11:29 pm
by IBMDS4KK
Hi,

Thanks for all. I got the solution by applying Multiple Instance.

Answers to Queries :

Oracle Table Columns = 266
Business Requirements = Job has to run on one node
Job has to finish with in 1 hr
Loading Data = Insert Mode

Posted: Fri Jan 17, 2014 2:46 pm
by daignault
Another way to speed up processing when reading sequential files is read the data as a single column since reading the CSV is a sequential operation. Then parse the data so that the access of the data is sequential but the parsing of the data into columns is done in parallel.

Regards

Ray D