Data Loading Based On Size of the Source

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shivakumar
Participant
Posts: 31
Joined: Wed Mar 17, 2004 3:33 am

Data Loading Based On Size of the Source

Post by shivakumar »

Hi ,

I am having one requirement based on the Source Size I have to load the Data in to target.For each run I have to execute the job.

For Example My Source table size is 100 MB then I have to run my Job for 10 times and each run I have yo load 10MB data in the Target table.

If the job fails in the First run then I have to execute first 10 MB data again,If the 100 MB Data is completed then again I have to Start from First on 11th Day.

Can any one help me regarding this?

Thanks and Regards
Siva
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Think of a multi instance job. determine the range of values in the source data. Give the range in the SQL where clause. Probably pass the where clause as parameter or the SQL statement as parameter.
shivakumar
Participant
Posts: 31
Joined: Wed Mar 17, 2004 3:33 am

Post by shivakumar »

[quote="Maveric"]Think of a multi instance job. determine the range of values in the source data. Give the range in the SQL where clause. Probably pass the where clause as parameter or the SQL statement as parameter.[/quote]


Hi ,

Actually here the requirement is I have to run the Same job One after another because as per the requirement I have to load the 10mb data first after that another 10mb.

If I create the Multiple Instance then job will run parallely but not sequentially.

Regards
Siva
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not true. Multi-instance jobs run as and when you request.

That said, you don't need multi-instance to run the same job over and over but only one instance at any one time.

Why do you have this strange requirement to load only 10MB at a time?
Is this your design, or an imposed requirement? If the latter require them to justify it; it defeats the whole purpose of an ETL tool.
Last edited by ray.wurlod on Thu Aug 02, 2007 3:12 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, that sure gets an honorary membership into our Hall of Very Odd Requirements. While you may be able to limit something based on a record count, no clue how you do the same for 'source size' unless you compute the average record length and turned that into a record count limit on a case-by-base basis. :?

Sounds like a job for a Looping Sequence.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

By making the job multi instanced will give u the flexibility of running the job simultaneously with different instance ID. You can still run the job instance after instance. If you are scheduling the runs then it would be easier to identify the instance which failed and run just that instance.
Post Reply