Sequential file vs dataset in performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

Sequential file vs dataset in performance

Post by vij »

Hi all,

I have 2 jobs, both uses the same sequential file (has about 100 Million records) as input. As I have two different jobs using the same sequential file, i thought if i use a dataset which gets loaded from the sequential file and then use this dataset in those two jobs as the input the performance would be better, am i rite? advice me pls..

Thanks in advance!
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

There are two sequential read that you are planning to optimize. If you approach the dataset conversion and reading the generated dataset, you again have to do one sequential read (To convert that into Dataset). The other read should be compromised with the two dataset read. Again Dataset access wont be 100 effecient, it will also consume some IO. The rate of access depends on the number of partition, CPU utilization at the point of read, network congestion etc...
So it would be more realistic, if you could do a test run in you site by yourself and determine the difference.
And you can post the stats to this site if interesting.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the data are in a sequential file, you have to read the sequential file (even to get the data into a persistent Data Set). So there's no "either/or" about it.

Investigate "multiple readers per node" property of the Sequential File stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply