Processing Large Volume of Data
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
Processing Large Volume of Data
Hi,
If suppose I have to procss say some 200 millions of rows which have to undergo some transformations and validations and then ladn into a Database table.
Now, what I am doing currently is doing all validations adn lookups and directly insert (Insert without clearing) into a table.
What I want to consider now is that, why not change my target to a flat file and the use the DB2 Bulk Load utility to load the data into the table.
I dont know abt DB2 bulk load uitility, I have to still explore about it,
but just wnated to have u people's comments on it.
Thx.
If suppose I have to procss say some 200 millions of rows which have to undergo some transformations and validations and then ladn into a Database table.
Now, what I am doing currently is doing all validations adn lookups and directly insert (Insert without clearing) into a table.
What I want to consider now is that, why not change my target to a flat file and the use the DB2 Bulk Load utility to load the data into the table.
I dont know abt DB2 bulk load uitility, I have to still explore about it,
but just wnated to have u people's comments on it.
Thx.
Are you getting that amount of data on a daily basis???
If yes then its going to be update too right?
If this is for the historical load then i would say yes, go for the db2 load stage. Explore it, search here to set its properties correctly.
If yes then its going to be update too right?
If this is for the historical load then i would say yes, go for the db2 load stage. Explore it, search here to set its properties correctly.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
I am going to get data on daily basis, but it wont be updated using my process. I will just get new data. Udpations are handled separately.DSguru2B wrote:Are you getting that amount of data on a daily basis???
If yes then its going to be update too right?
If this is for the historical load then i would say yes, go for the db2 load stage. Explore it, search here to set its properties correctly.
Also can u please comment on the scene which invlioves Update existing data. Can in that case I can go fwd with my new planned approach.
I highly doubt you can update data using the load utility. As the name suggest, it can just load, in other words, LOAD, INSERT, REPLACE etc and not update. Updates will be handled seperately as a logged activity (DML).
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
Well, can u comment sth on the performance level of DB2 Bulk Laod?DSguru2B wrote:I highly doubt you can update data using the load utility. As the name suggest, it can just load, in other words, LOAD, INSERT, REPLACE etc and not update. Updates will be handled seperately as a logged activity (DML).
Waht is rate at which it can process rows?
Will there be any difference between (i) using bulk load and (ii)loading directly into table.
Well, can u comment sth on the performance level of DB2 Bulk Laod?
Performance is a relative term. Very relative, proportional to the size of data, engine power of your cpus(thinking in terms of cars
), number of cpus, network traffic etc etc.
Waht is rate at which it can process rows?
I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.
Will there be any difference between (i) using bulk load and (ii)loading directly into table
Most definately. Load is not a logged activity. Insert is, which means that for every inserted row, an entry is logged. Try it for yourself. You will be amazed.
Performance is a relative term. Very relative, proportional to the size of data, engine power of your cpus(thinking in terms of cars
![Twisted Evil :twisted:](./images/smilies/icon_twisted.gif)
Waht is rate at which it can process rows?
I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.
Will there be any difference between (i) using bulk load and (ii)loading directly into table
Most definately. Load is not a logged activity. Insert is, which means that for every inserted row, an entry is logged. Try it for yourself. You will be amazed.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
[quote="DSguru2B"]Waht is rate at which it can process rows?
I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.
quote]
Gr8, can u please sahre some more of ur experience in the case of '40k rows per second'? What was ur case? I wud really appreciate if u can share the case study.
I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.
quote]
Gr8, can u please sahre some more of ur experience in the case of '40k rows per second'? What was ur case? I wud really appreciate if u can share the case study.
Re: Processing Large Volume of Data
Ok well, are you able to load data into the FF successfully.
Well, my case was just a general case. Lots of transformations, lookups etc etc. I did all that and then captured the results in a flat file. Then my final job would pick that file and load it to the UDB table using the load utility.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
It would a hr8 help, if u may share the following info:DSguru2B wrote:Well, my case was just a general case. Lots of transformations, lookups etc etc. I did all that and then captured the results in a flat file. Then my final job would pick that file and load it to the UDB table using the load utility.
1. What was the DS Version/Edition?
2. What was the hardware config?
3. What was the no. of rows u had to process? What was the frequency of running the job?
Version of Datastage was 7.5/ server edition.
Unix system was Sun OS 5.9 with 8 cpus at our disposal.
We were processing about 10M rows. This was just the historical load and then i had seperate jobs that used the DB2 API stage for inserts as they were just a few thousands.
Unix system was Sun OS 5.9 with 8 cpus at our disposal.
We were processing about 10M rows. This was just the historical load and then i had seperate jobs that used the DB2 API stage for inserts as they were just a few thousands.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
DB2 bulk load is awesome, works really good and high speed. But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here
) i.e. you may need to restore table inetigrity!
People working on insert and havn't seen the power of udb utilities will be amazed by the speed for sure
![Wink :wink:](./images/smilies/icon_wink.gif)
People working on insert and havn't seen the power of udb utilities will be amazed by the speed for sure
![Smile :)](./images/smilies/icon_smile.gif)
DSguru2B wrote:Version of Datastage was 7.5/ server edition.
Unix system was Sun OS 5.9 with 8 cpus at our disposal.
We were processing about 10M rows. This was just the historical load and then i had seperate jobs that used the DB2 API stage for inserts as they were just a few thousands.
What do you mean by that?tagnihotri wrote: But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here) i.e. you may need to restore table inetigrity!
I agree that if the load job is killed or terminated abruptly, it leaves the table in load pending state, but thats about as far as "disintegrated form" goes. It can be easily removed by loading from dev/null to bring it back to normal state. If you have had more problems, please, enlighten us.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 83
- Joined: Sat Oct 28, 2006 6:25 am
I got this one back in 2004 on DS 7.0, there were few snap tables ( tables which store data snap shots for each month) where they had to store more than 10 M rows every month. The performance issue obviously was there now because these were straight insert never used much of brains (although I am still not sure whether I have any
) and asked them to use bulk utitlities. But surprisingly everytime they ran these they got inconsistency error.
I know the first thing which strikes in our mind is data may be wrong but I checked it all. The solution - use set inetegrity after every run!
And yes it was DB2 server.
![Smile :)](./images/smilies/icon_smile.gif)
I know the first thing which strikes in our mind is data may be wrong but I checked it all. The solution - use set inetegrity after every run!
And yes it was DB2 server.
DSguru2B wrote:What do you mean by that?tagnihotri wrote: But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here) i.e. you may need to restore table inetigrity!
I agree that if the load job is killed or terminated abruptly, it leaves the table in load pending state, but thats about as far as "disintegrated form" goes. It can be easily removed by loading from dev/null to bring it back to normal state. If you have had more problems, please, enlighten us.