Dealing with huge data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Dealing with huge data

Post by datastagedw »

Hi All,

We have a requirement of running a query on a huge DB table (100 million records or more) to get the required information. We do not have any filtering criteria to reduce the amount of data to be brought in to DS. Also we need to join this huge data with little less(less than a million records) data from a different database table.

Is it advisable to run this join in DS using DB connector and join stages. We are running on a 4 node SMTP server. let me know if more details are required
ETL DEVELOPER
shawn.k
Premium Member
Premium Member
Posts: 7
Joined: Wed Oct 06, 2010 10:47 am

Re: Dealing with huge data

Post by shawn.k »

You should be fine doing this join in DS. I suggest you to use correct partitioning and sort data if possible in DB before it gets to DS. Test your job with less data and see how it works before running it against full set.
Shawn K
--------------------------------------------------------
"What is right is not always popular and what is popular is not always right."
— Albert Einstein
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Dealing with huge data

Post by chulett »

datastagedw wrote:Also we need to join this huge data with little less(less than a million records) data from a different database table.
A different table... in the same database?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Is that join going to reduce the data???
If yes, then you can load the data from the "other" database into your current database that houses your source. Pass a sql join and extract.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply