view read

bryan · Post by **bryan** » Wed Mar 09, 2005 2:07 pm

We are sourcing from views which has like 15 million rows and we do 5 validations against hash lookup. Writing into a sequential file.
Read throughput:300rows/sec

On an average, each million takes one hour.

How can I do to leverage Datastae/AIX power?

5CPU
2GB memory
AIX 5.0

kcbland · Post by **kcbland** » Wed Mar 09, 2005 3:32 pm

Convert your job to write to a parameterized output file and run 5 instances of it. Have the select statement portion the data out into 1/5th chunks, perhaps using a WHERE MOD(somekeycolumn,5) = 0,1,2,3,4 where 0 to 4 is fed as a value to each instance.

Hopefully, your source database can handle you hitting that view 5 times simultaneously. If it can, you should finish 5X faster.

This technique is the only technique that can scale in multiples of performance, which is what you want. I'm fairly confident that no amount of hash file tweaking and tuning will give you a five-fold increase. But, tune them anyway so that each instance is optimally doing reference lookups.

T42 · Post by **T42** » Wed Mar 09, 2005 6:02 pm

Or buy DataStage EE and let the code do the work for you most of the time.

kcbland · Post by **kcbland** » Wed Mar 09, 2005 7:35 pm

Smart aleck young whippersnapper.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Mar 10, 2005 4:22 am

Set inter-process buffer.

Use IPC-Stage.

Partition your hash files.

Use multi-instance jobs.

You can do lots in Server component itself.