effecient way to load millions of data in PX
Moderators: chulett, rschirm, roy
effecient way to load millions of data in PX
Hi ,
I have 100's of millions of records in table "A" and I have daily maintenace coming for this table "A" as text file . THis text file also have millions of recods everyday. THis txt file data coming for insert/update, this upsert data range from anywhere in history daya .
What is the efficient way to load this txt file using datastage PX job ?
thanks,
RP
I have 100's of millions of records in table "A" and I have daily maintenace coming for this table "A" as text file . THis text file also have millions of recods everyday. THis txt file data coming for insert/update, this upsert data range from anywhere in history daya .
What is the efficient way to load this txt file using datastage PX job ?
thanks,
RP
That's very correct but look at the trade-off. 100 million records wont be changed records. If you do a blind update it will take far more time than the time change capture will take to prune out unchanged records. And as DsGuru suggested, use bulk loader for the insert records while using update for changed records.rajpatel wrote:any other method, CDC is not that fast.
![Smile :)](./images/smilies/icon_smile.gif)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The nice thing about PX is that you can simply throw more processing nodes at it. Unless there are specific order issues in the text file, you can use multiple readers (each reading a portion of the file) or pre-process the file to split it into multiple files. Research the Sequential File stage properties.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Is there anything in the text file that enables you to determine if it relates to an insert or an update? Perhaps a record id from a sequence for example.
I agree that you probably need to separate the inserts from the updates but only if there are a significant number of inserts. You should be able to load inserts faster than updates by using the Load Append option i.e. sql loader in the Oracle Enterprise stage. If there aren't many inserts this won't help much as the updates will slow you down.
Is the table you are updating indexed on just the primary key or is it heavily indexed on other fields as well?
If you end up splitting the data and loading the inserts you will have to decide how to maintain the indexes. The load append will require the indexes to be rebuilt. If you are doing this you will want to minimise the number of partitions you rebuild indexes on by determining which ones have been loaded (I assume your Oracle table with 100's of millions of rows is partitioned).
Of course until the indexes are valid you can't apply the updates with an update statement.
Can you be a little more precise with regard to volumes of data in the table and the number of inserts / updates?
I agree that you probably need to separate the inserts from the updates but only if there are a significant number of inserts. You should be able to load inserts faster than updates by using the Load Append option i.e. sql loader in the Oracle Enterprise stage. If there aren't many inserts this won't help much as the updates will slow you down.
Is the table you are updating indexed on just the primary key or is it heavily indexed on other fields as well?
If you end up splitting the data and loading the inserts you will have to decide how to maintain the indexes. The load append will require the indexes to be rebuilt. If you are doing this you will want to minimise the number of partitions you rebuild indexes on by determining which ones have been loaded (I assume your Oracle table with 100's of millions of rows is partitioned).
Of course until the indexes are valid you can't apply the updates with an update statement.
Can you be a little more precise with regard to volumes of data in the table and the number of inserts / updates?