Hi ppl,
I am being questioned on datastage's loading capability.
I have a billion records in a file and i need to insert/update/delete a db2 table. And, adding to that i need to make it quick. Can you please suggest on what i can do to make it quickest possible with datastage.
I initially thought of bulk loader but it can be used only for inserts and not updates or deletes. I would be doing transformation but that would be very minimal and in most cases i would just have to feed the file to the database for inserts/updates/deletes.
Thanks for your help!
best practice to handle billion records
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
1. Search DSXchange for ideas about throughput performance.
2. Create a job that identifies which rows are to be inserted and which have to be updated/deleted. Bulk load the inserts.
3. Create multiple instances of your job each of which processes a proportion of the records. (Be very careful with this if there are likely to be duplicate keys in the source.)
4. Employ inter-process row buffering.
2. Create a job that identifies which rows are to be inserted and which have to be updated/deleted. Bulk load the inserts.
3. Create multiple instances of your job each of which processes a proportion of the records. (Be very careful with this if there are likely to be duplicate keys in the source.)
4. Employ inter-process row buffering.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 41
- Joined: Tue Jul 08, 2008 5:45 am
- Location: Columbus
Re: best practice to handle billion records
Hi,
In this case you can initially load the data with a bulk loader and then you can go for normal job design for incremental load(Update and delete). Also you can use link partitioner to load target table so that it can also reduce the time.
Along with you can also use a Primary key indexes, useful indexes on other columns as well and parallel hints in the query.
In this case you can initially load the data with a bulk loader and then you can go for normal job design for incremental load(Update and delete). Also you can use link partitioner to load target table so that it can also reduce the time.
Along with you can also use a Primary key indexes, useful indexes on other columns as well and parallel hints in the query.
Re: best practice to handle billion records
ofcourse you should use the DB2 EE stage for extract, update and loading of the records.
I have used the DB2 EE stage extensively and its extremely fast. just that you need to set it up properly. Also as someone suggested, use all indexes and do a query tuning exercise before you put your code in production.
I suggest you use the DB2 bulk loader utility outside of DataStage if you want to bulk load heavy data in db2. the input file can be created by the datastage jobs.
I have used the DB2 EE stage extensively and its extremely fast. just that you need to set it up properly. Also as someone suggested, use all indexes and do a query tuning exercise before you put your code in production.
I suggest you use the DB2 bulk loader utility outside of DataStage if you want to bulk load heavy data in db2. the input file can be created by the datastage jobs.