Page 1 of 1

Improving Performance

Posted: Fri Apr 04, 2008 9:26 pm
by soporte
Hi,

I need to join a DataSet (left link) with 66 millons of records with a Sequential File (right link) with 28 millons of records.

I tried reading the Sequencial File with Sequential Stage 1, 2 and 4 readers per node but the importing process to the virtual DataSet is taking a lot of time.

1) Is there any tip to improve the importing process of sequential files?
2) If I need to join / merge two big sequential files (>20M records), is it posible to join / merge them without importing them to a virtual dataset in DataStage EE?. If no, what is the best way to do this?

Thx

Posted: Fri Apr 04, 2008 9:42 pm
by ray.wurlod
1. Not really. Multiple readers is about all you have unless you chop up the file first.

2. No. Sparse lookups are only available for DB2 and Oracle Enterprise stages. There is a virtual Data Set associated with very other link between non-combined operators.

You *may* get some gain by preventing the Join stage from combining, but only if you have spare CPU and memory capacity. You might also consider increasing the memory consumed by the Join stage.

Posted: Sun Apr 06, 2008 5:34 pm
by John Smith
what do you mean by slow? how long does it take for your server to import to a virtual dataset? is your sequential file created/exist in a filesystem that spans multiple disks ?
make sure your scratch disk is not in the same filesystem as your sequential files. no point tuning anything in DS when you have disk contention in your OS!
just make sure that you are not getting a lot of hits in the single disk, if you are then you're not going to get much help in DS.
may be worth getting your aix admin to have a check.