best practice to handle billion records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

best practice to handle billion records

Post by DSbox61 »

Hi ppl,

I am being questioned on datastage's loading capability.

I have a billion records in a file and i need to insert/update/delete a db2 table. And, adding to that i need to make it quick. Can you please suggest on what i can do to make it quickest possible with datastage.

I initially thought of bulk loader but it can be used only for inserts and not updates or deletes. I would be doing transformation but that would be very minimal and in most cases i would just have to feed the file to the database for inserts/updates/deletes.

Thanks for your help!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Search DSXchange for ideas about throughput performance.

2. Create a job that identifies which rows are to be inserted and which have to be updated/deleted. Bulk load the inserts.

3. Create multiple instances of your job each of which processes a proportion of the records. (Be very careful with this if there are likely to be duplicate keys in the source.)

4. Employ inter-process row buffering.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sathishkumarins
Premium Member
Premium Member
Posts: 41
Joined: Tue Jul 08, 2008 5:45 am
Location: Columbus

Re: best practice to handle billion records

Post by Sathishkumarins »

Hi,

In this case you can initially load the data with a bulk loader and then you can go for normal job design for incremental load(Update and delete). Also you can use link partitioner to load target table so that it can also reduce the time.

Along with you can also use a Primary key indexes, useful indexes on other columns as well and parallel hints in the query.
infranik
Participant
Posts: 20
Joined: Tue Oct 11, 2005 8:11 am

Re: best practice to handle billion records

Post by infranik »

ofcourse you should use the DB2 EE stage for extract, update and loading of the records.
I have used the DB2 EE stage extensively and its extremely fast. just that you need to set it up properly. Also as someone suggested, use all indexes and do a query tuning exercise before you put your code in production.
I suggest you use the DB2 bulk loader utility outside of DataStage if you want to bulk load heavy data in db2. the input file can be created by the datastage jobs.
Post Reply