How to Select All records at once?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DCSD
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 12, 2011 8:36 am

How to Select All records at once?

Post by DCSD »

I am reading 600 million records from the database and I don't want it to read 2000 records at a time because that would be too many hits against the database. I want to read all records and store it into a sequential file so I don't have to keep going back to the database.

How do I get it to 'Select All' rather than chunks at a time?

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Assuming your database server supports it (which I doubt) you could declare an array size of 600000000.

There's nothing wrong with array size of 2000, it's quite efficient, particularly if the database server handles arrays correctly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
djbarham
Participant
Posts: 34
Joined: Wed May 07, 2003 4:39 pm
Location: Brisbane, Australia

Re: How to Select All records at once?

Post by djbarham »

DCSD wrote:I am reading 600 million records from the database and I don't want it to read 2000 records at a time because that would be too many hits against the database.
I think you misunderstand what ths assay size is doing.

It is NOT generating 300,000 separate queries against the database where each retrieves 2000 rows.

It is one query against the database.

What the array size does is bundle the results of the query into packets of 2000 records for transmission from the database server to the DataStage server.

As you increase the array size, your throughput will increase. However, I think you will find that past about 2000 rows, are further performance gains will be barely measureable ... and there will be physical limitations that you run into.

Keep in mind too that ALL software that interacts with a database must interact in this same way. (Although some primative ODBC drivers have an effective array size of 1).

As Ray said, 2000 is fine.
Post Reply