Loading Huge Data
Moderators: chulett, rschirm, roy
Loading Huge Data
Hi
I am getting around 40-50 million records from an oracle table. I need to do some join and simple transformations and load the data into another table.
ODBC-->Transformer-->ODBC
I went through the forum which discused the use of IPC, Multiple instances etc.
I am thinking of creating multiple instances of my job.
I know to create multiple instances in DS. But the concept of how to partition is something new to me.
Could anyone explain me in little bit more detail, how to accomplish this.
Thanks
I am getting around 40-50 million records from an oracle table. I need to do some join and simple transformations and load the data into another table.
ODBC-->Transformer-->ODBC
I went through the forum which discused the use of IPC, Multiple instances etc.
I am thinking of creating multiple instances of my job.
I know to create multiple instances in DS. But the concept of how to partition is something new to me.
Could anyone explain me in little bit more detail, how to accomplish this.
Thanks
Pradeep Kumar
Yuck... that will take for-ev-a. Look into making it a two step process... first extract and transform, landing it to a flat file. This job can be 'multi-instanced' rather easily, using mod in Oracle and a 'partition number' as the Invocation ID. Make sure the output files have a unique name as well by including the Invocation ID as part of the filename.
Then suggest you bulk load the data if at all possible. That could get your load portion of this little passion play down to minutes rather than many, many (many) hours.
Then suggest you bulk load the data if at all possible. That could get your load portion of this little passion play down to minutes rather than many, many (many) hours.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
I am thinking to create a job sequence by allowing multiple instances of the job. I decided to create 4 instances of the original job.
I am thinking to partition the input data based on a period (as ia m loading only 2005 data). Periods are different. So this should be given as a parameter and in where clause. If so, could you give some guidance on how to do it
I am thinking to partition the input data based on a period (as ia m loading only 2005 data). Periods are different. So this should be given as a parameter and in where clause. If so, could you give some guidance on how to do it
Pradeep Kumar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You have it exactly.
Use the job parameter references to replace constants in the WHERE clause of the extraction SQL. For example:
You can do this on the Selection tab - you do not need to resort to user-defined SQL.
Use the job parameter references to replace constants in the WHERE clause of the extraction SQL. For example:
Code: Select all
WHERE sel_date >= '#StartDate#' AND sel_date < '#EndDate#'
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thx for the reply.
So we need to give different ranges of satrt date and end date in each job sequence.
I need to do partition first on year and then on period (1-12 for 12 months). My Where clause in main job is
"WHERE tablLE_NAME.PERIOD_NAME = '#PERIOD#'.
PERIOD_Name is a column in my source table. has values like(Jan-05,Feb-05.........,Jan-04......) Each year is having around 5 million records. So first we decided to load data year wise and use 4 instances in each year. Each instance will load three periods of data.
So now this PERIOD is appearing in each and every joba ctivity in sequence. Now I need to pass the values to these 4 different joba ctivities for PERIOD. Liek First Joba ctivity should get Jan-05,Feb-05,Mar-05 and second one hsould get Apr,May,Jun and so on..
How these values can be passed from sequencer to main job. I am not sure how to specify it in sequencer.
How to accomplish this
So we need to give different ranges of satrt date and end date in each job sequence.
I need to do partition first on year and then on period (1-12 for 12 months). My Where clause in main job is
"WHERE tablLE_NAME.PERIOD_NAME = '#PERIOD#'.
PERIOD_Name is a column in my source table. has values like(Jan-05,Feb-05.........,Jan-04......) Each year is having around 5 million records. So first we decided to load data year wise and use 4 instances in each year. Each instance will load three periods of data.
So now this PERIOD is appearing in each and every joba ctivity in sequence. Now I need to pass the values to these 4 different joba ctivities for PERIOD. Liek First Joba ctivity should get Jan-05,Feb-05,Mar-05 and second one hsould get Apr,May,Jun and so on..
How these values can be passed from sequencer to main job. I am not sure how to specify it in sequencer.
How to accomplish this
Pradeep Kumar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you generate YEAR upstream in the job sequence (maybe as a sequence parameter) you can use a Routine activity to generate the four date ranges, perhaps pass them back as delimited strings, and unpack them. For example the routine could return a dynamic array containing four fields each with three PERIOD values. You would then load three job parameters with these and specify then in a WHERE clause of the form
Example Routine:
In instance number 4, which handles the fourth quarter, job parameter values are derived as
Code: Select all
WHERE tablename.PERIOD_NAME IN ('#jpPeriod1#', '#jpPeriod2#', '#jpPeriod3')
Code: Select all
FUNCTION Periods(Year)
* Returns four fields containing three period names each
Year2 = Right(Year,2) ; * rightmost two digits of year
Ans = ""
Ans<1> = "Jan-" : Year2 : @VM : "Feb-" : Year2 : @VM : "Mar-" : Year2
Ans<2> = "Apr-" : Year2 : @VM : "May-" : Year2 : @VM : "Jun-" : Year2
Ans<3> = "Jul-" : Year2 : @VM : "Aug-" : Year2 : @VM : "Sep-" : Year2
Ans<4> = "Oct-" : Year2 : @VM : "Nov-" : Year2 : @VM : "Dec-" : Year2
RETURN(Ans)
Code: Select all
jpPeriod1 RoutineActivityName.$ReturnValue<4,1>
jpPeriod2 RoutineActivityName.$ReturnValue<4,2>
jpPeriod3 RoutineActivityName.$ReturnValue<4,3>
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.