Running Multiple Instances-Partition By Date
Moderators: chulett, rschirm, roy
Running Multiple Instances-Partition By Date
Hi
I am trying to run my Job as Multiple instances.. I developed a job sequence with six instances of the job.
I am trying to partition the data on time stamp. I am having a create_date in my source data.
So I am trying to use a job parameter to which I can assign a year and in the main job's WHERE cluase I am using htis year as " Where creation_date = #YEAR#.
This looks promising for FULL LOAD
But to do a incremental load, I think this is not going to work..Plz let me know any ideas or comments on this.
I am trying to run my Job as Multiple instances.. I developed a job sequence with six instances of the job.
I am trying to partition the data on time stamp. I am having a create_date in my source data.
So I am trying to use a job parameter to which I can assign a year and in the main job's WHERE cluase I am using htis year as " Where creation_date = #YEAR#.
This looks promising for FULL LOAD
But to do a incremental load, I think this is not going to work..Plz let me know any ideas or comments on this.
Pradeep Kumar
Do You want to have the same dates (date ranges) in each file? F.e. 20070101-20070105 into file1, 20070106-20070110 to file2 etc. ?
If not then simply use Link Partitioner stage and put the data into sequential files and then use them in Your multiple instance job.
If yes then this would require some kind of calculation to get proper splitting parameter(s) (according to your needs) and then using them in a Transformer stage to put rows into proper files.
If not then simply use Link Partitioner stage and put the data into sequential files and then use them in Your multiple instance job.
If yes then this would require some kind of calculation to get proper splitting parameter(s) (according to your needs) and then using them in a Transformer stage to put rows into proper files.
Regards,
Wojciech Nogalski
Wojciech Nogalski
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If your incremental load contains creation dates from all years, then your approach (assuming it works for full load) should work OK for incremental load.
I'm dubious about comparing something called "...date" against a job parameter called "#YEAR#" - maybe I'm more finicky about meaningful names than you are.
If your incremental loads contain only dates from the current year, then five of your six job invocations will process zero rows. While not wrong, it seems inefficient to start them at all.
You may want to clone the job for incremental loads, and make the division of work a finer level of granularity - perhaps months, perhaps date ranges. Whatever you choose you would implement through job parameters just as you have with the #YEAR# parameter for the full load.
I'm dubious about comparing something called "...date" against a job parameter called "#YEAR#" - maybe I'm more finicky about meaningful names than you are.
If your incremental loads contain only dates from the current year, then five of your six job invocations will process zero rows. While not wrong, it seems inefficient to start them at all.
You may want to clone the job for incremental loads, and make the division of work a finer level of granularity - perhaps months, perhaps date ranges. Whatever you choose you would implement through job parameters just as you have with the #YEAR# parameter for the full load.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
What about
creation_date = #YEAR# and creation_date > #Last Run Date#
#Last Run Date# can be another parameter passed from the Sequence, and you need to maintain it for all the partitions. You need to write routine(s) to write/fetch the last run date for each instance/partition.
creation_date = #YEAR# and creation_date > #Last Run Date#
#Last Run Date# can be another parameter passed from the Sequence, and you need to maintain it for all the partitions. You need to write routine(s) to write/fetch the last run date for each instance/partition.
Success consists of getting up just one more time than you fall.
Thanks for all your ideas.
I am actually using two date job parameters as LoadDate and LastLoadDate.
My where clause is like
So now using Sequence and 6 job activities my where clause in the main job will be something like
The above one will be for full load. By the way i am using the LoadDate column in the final table to record the date it is loaded. So the above one is going to create a mess with a loaddate. So I decided to use another column known as RunDate and update it to the date when it is loaded in data warehouse.
Any comments on my idea plz?
I am actually using two date job parameters as LoadDate and LastLoadDate.
My where clause is like
Code: Select all
Creation_Date Between LastLoadDate and LoadDate
Code: Select all
Extract year from Creation_Dtae and compare it to the Year from Job activity.
Code: Select all
JobActivity1: LoadDate = 2006-12-31; LastlOadDate = 2006-01-01; Year ='2006'
JobActivity2: LoadDate = 2005-12-31; LastlOadDate = 2005-01-01; Year ='2005'
JobActivity3: LoadDate = 2004-12-31; LastlOadDate = 2004-01-01; Year ='2004'
and so on until 2000
Any comments on my idea plz?
Pradeep Kumar
Do You want to extract the data in parallel jobs ? IMHO it will be better to extract the data in one job, split it using Transformer and Constraints (f.e. link1 loadyear=2006, link2 loadyear=2005 etc.) and land the data into sequential files which will be transformed in parallel.
When all needed transformations, data cleansing activities etc. are done use Link Collector to put the data together into one sequential file and then insert it (or use bulk loader) into destination table in Your DW.
When all needed transformations, data cleansing activities etc. are done use Link Collector to put the data together into one sequential file and then insert it (or use bulk loader) into destination table in Your DW.
Regards,
Wojciech Nogalski
Wojciech Nogalski