Sequential file performance issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
hemant
Participant
Posts: 67
Joined: Mon Dec 15, 2003 6:43 am

Sequential file performance issue

Post by hemant »

Dear All

I have a server job in which from one sequential file i am splitting the data into many sequential file,
further the records are been looked up by many hash files(i.e 4 hash files) .
The speed i am getting rows/sec is very low thats why my job total timespan increases and eventually takes a
longer time to complete .
1.What are the ways i can increase the speed of this job in terms of rows/sec?
2. What are the performance tuning parameters of sequential file which affects the speed,is there any document over that i can refer ?

suggest



Regards:

Hemant Krishnatrey
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could be overloading the machine. How many CPUs are there?
What is the basis for splitting the rows? Is this being done in a single Transformer stage? How complex are the business rules? Are the rows in the sequential file improbably large?

There are no tunables for the Sequential File stage.

The Sequential File stage is VERY fast. To prove this, construct the following job.

Code: Select all

SeqFile  ----->  Transformer  ----->  SeqFile
Make the output constraint on the Transformer stage the system variable @FALSE. Now run the job. This will give you some idea of how well the Sequential File stage can perform. Then go and discover where the real problem lies.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

A hunch, the culprit is in the transformer. Let me guess: you have 1 transformer and 20 to 40 outputfiles, for every row in input all of the constraints have to evaluated and this takes time.

In some cases it even makes sense to split such a job in a couple of jobs and have the file processed several times by different jobs.

DataStage is a tool not a magical bullet, most jobs I write have at maximum 4 to 6 stages (including maybe 1 lookup) and no fancy stages (only databasestages and sequential files).

With DataStage less is more speed, and you need all the speed you can get.

Ogmios
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Hi,

first of all Sequential file itself improving performance. at the same time u need to check you file data also. spiliting into 4 file also good idea.

if u spiliting more files some time it will give low performance .
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Just some ideas, a single job writing out to four sequential files is no faster than a job writing to a single file. You achieve extra performance if you process the source data using four parallel jobs and you have the CPUs available to service each parallel job.

One thing you can try with sequential files is writing out to files on different disk partitions.

There are a few tunables on hash files which have been mentioned on quite a few previous threads. Have a look at the memory settings on your hash file stages.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just to clarify, Vince's point about using separate disks really means different drives (spindles). Different partitions that are slices on the same spindle don't result in any gain; indeed, they can result in contention and therefore reduced throughput.

Four parallel streams all reading the same source file will mean that three of those streams will read from cache. If they are writing to separate output files, this will be very fast. The separate output files can later be combined very fast with cat (UNIX) or type or copy (DOS). For example

Code: Select all

cat file2 file3 file4 >> file1
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply