Processing Large Volume of Data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Processing Large Volume of Data

Post by asitagrawal »

Hi,

If suppose I have to procss say some 200 millions of rows which have to undergo some transformations and validations and then ladn into a Database table.

Now, what I am doing currently is doing all validations adn lookups and directly insert (Insert without clearing) into a table.

What I want to consider now is that, why not change my target to a flat file and the use the DB2 Bulk Load utility to load the data into the table.

I dont know abt DB2 bulk load uitility, I have to still explore about it,
but just wnated to have u people's comments on it.

Thx.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Are you getting that amount of data on a daily basis???
If yes then its going to be update too right?
If this is for the historical load then i would say yes, go for the db2 load stage. Explore it, search here to set its properties correctly.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

DSguru2B wrote:Are you getting that amount of data on a daily basis???
If yes then its going to be update too right?
If this is for the historical load then i would say yes, go for the db2 load stage. Explore it, search here to set its properties correctly.
I am going to get data on daily basis, but it wont be updated using my process. I will just get new data. Udpations are handled separately.

Also can u please comment on the scene which invlioves Update existing data. Can in that case I can go fwd with my new planned approach.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I highly doubt you can update data using the load utility. As the name suggest, it can just load, in other words, LOAD, INSERT, REPLACE etc and not update. Updates will be handled seperately as a logged activity (DML).
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

DSguru2B wrote:I highly doubt you can update data using the load utility. As the name suggest, it can just load, in other words, LOAD, INSERT, REPLACE etc and not update. Updates will be handled seperately as a logged activity (DML).
Well, can u comment sth on the performance level of DB2 Bulk Laod?
Waht is rate at which it can process rows?
Will there be any difference between (i) using bulk load and (ii)loading directly into table.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well, can u comment sth on the performance level of DB2 Bulk Laod?

Performance is a relative term. Very relative, proportional to the size of data, engine power of your cpus(thinking in terms of cars :twisted: ), number of cpus, network traffic etc etc.


Waht is rate at which it can process rows?

I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.

Will there be any difference between (i) using bulk load and (ii)loading directly into table

Most definately. Load is not a logged activity. Insert is, which means that for every inserted row, an entry is logged. Try it for yourself. You will be amazed.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

[quote="DSguru2B"]Waht is rate at which it can process rows?

I have gotton speeds of about 40k rows per second. Others have gotton more. Again totally relative to what you have under the hood.
quote]


Gr8, can u please sahre some more of ur experience in the case of '40k rows per second'? What was ur case? I wud really appreciate if u can share the case study.
shilpa79
Participant
Posts: 131
Joined: Thu Jan 20, 2005 5:59 pm
Location: Virginia

Re: Processing Large Volume of Data

Post by shilpa79 »

Ok well, are you able to load data into the FF successfully.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well, my case was just a general case. Lots of transformations, lookups etc etc. I did all that and then captured the results in a flat file. Then my final job would pick that file and load it to the UDB table using the load utility.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

DSguru2B wrote:Well, my case was just a general case. Lots of transformations, lookups etc etc. I did all that and then captured the results in a flat file. Then my final job would pick that file and load it to the UDB table using the load utility.
It would a hr8 help, if u may share the following info:
1. What was the DS Version/Edition?
2. What was the hardware config?
3. What was the no. of rows u had to process? What was the frequency of running the job?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Version of Datastage was 7.5/ server edition.
Unix system was Sun OS 5.9 with 8 cpus at our disposal.
We were processing about 10M rows. This was just the historical load and then i had seperate jobs that used the DB2 API stage for inserts as they were just a few thousands.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tagnihotri
Participant
Posts: 83
Joined: Sat Oct 28, 2006 6:25 am

Post by tagnihotri »

DB2 bulk load is awesome, works really good and high speed. But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here :wink: ) i.e. you may need to restore table inetigrity!

People working on insert and havn't seen the power of udb utilities will be amazed by the speed for sure :)

DSguru2B wrote:Version of Datastage was 7.5/ server edition.
Unix system was Sun OS 5.9 with 8 cpus at our disposal.
We were processing about 10M rows. This was just the historical load and then i had seperate jobs that used the DB2 API stage for inserts as they were just a few thousands.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

tagnihotri wrote: But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here :wink: ) i.e. you may need to restore table inetigrity!
What do you mean by that?
I agree that if the load job is killed or terminated abruptly, it leaves the table in load pending state, but thats about as far as "disintegrated form" goes. It can be easily removed by loading from dev/null to bring it back to normal state. If you have had more problems, please, enlighten us.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tagnihotri
Participant
Posts: 83
Joined: Sat Oct 28, 2006 6:25 am

Post by tagnihotri »

I got this one back in 2004 on DS 7.0, there were few snap tables ( tables which store data snap shots for each month) where they had to store more than 10 M rows every month. The performance issue obviously was there now because these were straight insert never used much of brains (although I am still not sure whether I have any :) ) and asked them to use bulk utitlities. But surprisingly everytime they ran these they got inconsistency error.

I know the first thing which strikes in our mind is data may be wrong but I checked it all. The solution - use set inetegrity after every run!

And yes it was DB2 server.

DSguru2B wrote:
tagnihotri wrote: But a word of caution it do cause the table to be in disintegerated form ( thats how I frame for my people here :wink: ) i.e. you may need to restore table inetigrity!
What do you mean by that?
I agree that if the load job is killed or terminated abruptly, it leaves the table in load pending state, but thats about as far as "disintegrated form" goes. It can be easily removed by loading from dev/null to bring it back to normal state. If you have had more problems, please, enlighten us.
Post Reply