Page 1 of 1

best practice to handle billion records

Posted: Mon Dec 15, 2008 5:11 pm
by DSbox61
Hi ppl,

I am being questioned on datastage's loading capability.

I have a billion records in a file and i need to insert/update/delete a db2 table. And, adding to that i need to make it quick. Can you please suggest on what i can do to make it quickest possible with datastage.

I initially thought of bulk loader but it can be used only for inserts and not updates or deletes. I would be doing transformation but that would be very minimal and in most cases i would just have to feed the file to the database for inserts/updates/deletes.

Thanks for your help!

Posted: Mon Dec 15, 2008 9:37 pm
by ray.wurlod
1. Search DSXchange for ideas about throughput performance.

2. Create a job that identifies which rows are to be inserted and which have to be updated/deleted. Bulk load the inserts.

3. Create multiple instances of your job each of which processes a proportion of the records. (Be very careful with this if there are likely to be duplicate keys in the source.)

4. Employ inter-process row buffering.

Re: best practice to handle billion records

Posted: Tue Dec 16, 2008 12:42 am
by Sathishkumarins
Hi,

In this case you can initially load the data with a bulk loader and then you can go for normal job design for incremental load(Update and delete). Also you can use link partitioner to load target table so that it can also reduce the time.

Along with you can also use a Primary key indexes, useful indexes on other columns as well and parallel hints in the query.

Re: best practice to handle billion records

Posted: Tue Dec 16, 2008 1:46 am
by infranik
ofcourse you should use the DB2 EE stage for extract, update and loading of the records.
I have used the DB2 EE stage extensively and its extremely fast. just that you need to set it up properly. Also as someone suggested, use all indexes and do a query tuning exercise before you put your code in production.
I suggest you use the DB2 bulk loader utility outside of DataStage if you want to bulk load heavy data in db2. the input file can be created by the datastage jobs.