Running data stage jobs in parallel

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Running data stage jobs in parallel

Post by admin »

Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

It sounds like two different questions:
1. Can you assign Job Parameters to DataStage jobs?
2. Can you run DataStage jobs in parallel?

1. DataStage supports Job Parameters in many fields throughout DataStage jobs. Some fields allow selection through a button "Add Parameter" and others simply by adding
#[parametername]# to a field. Often, you can embed multiple parameters for unusual structures or can name different types. To define a parameter, look under the Edit/Job Properties menu item.

If you add a parameter as a string constant in SQL, you may have to surround it with quotes like #[parametername]#

Job Parameter explanations are all over the documentation.

2. Yes, you can run DataStage jobs in parallel. Different jobs can run simultaneously with ease and the same job can be run concurrently (in different ways in DataStage 4 and DataStage 5). I frequently multistream jobs using Job Parameters and Job Control.

You mention staging the data. You may not need to actually land the data before loading to the warehouse. I have worked with a couple of converts to DataStage who find the need to land data or--particularly loading it to database staging tables--much lower with DataStage jobs.

Hash tables will also speed the lookup process for data from heterogeneous sources.

-----Original Message-----
From: Jason Mulvin [mailto:jmulvin@fuse.net]
Sent: Wednesday, October 03, 2001 8:32 AM
To: datastage-users@oliver.com
Subject: Running data stage jobs in parallel


Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi Jason,

Which version of DataStage are you using? You might want to check into the latest version of DataStage 5.1 which allows you to run DSJobs concurrently with Axcel Pack.

Pavan Marpaka

-----Original Message-----
From: Jason Mulvin [mailto:jmulvin@fuse.net]
Sent: Wednesday, October 03, 2001 8:32 AM
To: datastage-users@oliver.com
Subject: Running data stage jobs in parallel


Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

How do you run the same job concurrently in DataStage 4? I would be most interested.

I assume you do not mean copying the job and changing parameter values as then it is no longer the same job?

Phil

-----Original Message-----
From: lamont.lockwood@ascentialsoftware.com
[mailto:lamont.lockwood@ascentialsoftware.com]
Sent: Thursday, October 04, 2001 12:09 PM
To: datastage-users@oliver.com
Subject: RE: Running data stage jobs in parallel


It sounds like two different questions:
1. Can you assign Job Parameters to DataStage jobs?
2. Can you run DataStage jobs in parallel?

1. DataStage supports Job Parameters in many fields throughout DataStage jobs. Some fields allow selection through a button "Add Parameter" and others simply by adding #[parametername]# to a field. Often, you can embed multiple parameters for unusual structures or can name different types. To define a parameter, look under the Edit/Job Properties menu item.

If you add a parameter as a string constant in SQL, you may have to surround it with quotes like #[parametername]#

Job Parameter explanations are all over the documentation.

2. Yes, you can run DataStage jobs in parallel. Different jobs can run simultaneously with ease and the same job can be run concurrently (in different ways in DataStage 4 and DataStage 5). I frequently multistream jobs using Job Parameters and Job Control.

You mention staging the data. You may not need to actually land the data before loading to the warehouse. I have worked with a couple of converts to DataStage who find the need to land data or--particularly loading it to database staging tables--much lower with DataStage jobs.

Hash tables will also speed the lookup process for data from heterogeneous sources.

-----Original Message-----
From: Jason Mulvin [mailto:jmulvin@fuse.net]
Sent: Wednesday, October 03, 2001 8:32 AM
To: datastage-users@oliver.com
Subject: Running data stage jobs in parallel


Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

That is correct. I have one job, named something like "mAllLoadArInvoice" that I want to be running five times at once (to take more advantage of my machine), each instance of the job using different parameters.

Currently, I am using DataStage 4.2 on NT writing to Oracle on UNIX.

jason

-----Original Message-----
From: Phil Walker [mailto:philw@gnosys.co.nz]
Sent: Wednesday, October 03, 2001 9:26 PM
To: datastage-users@oliver.com
Subject: RE: Running data stage jobs in parallel


How do you run the same job concurrently in DataStage 4? I would be most interested.

I assume you do not mean copying the job and changing parameter values as then it is no longer the same job?

Phil

-----Original Message-----
From: lamont.lockwood@ascentialsoftware.com
[mailto:lamont.lockwood@ascentialsoftware.com]
Sent: Thursday, October 04, 2001 12:09 PM
To: datastage-users@oliver.com
Subject: RE: Running data stage jobs in parallel


It sounds like two different questions:
1. Can you assign Job Parameters to DataStage jobs?
2. Can you run DataStage jobs in parallel?

1. DataStage supports Job Parameters in many fields throughout DataStage jobs. Some fields allow selection through a button "Add Parameter" and others simply by adding #[parametername]# to a field. Often, you can embed multiple parameters for unusual structures or can name different types. To define a parameter, look under the Edit/Job Properties menu item.

If you add a parameter as a string constant in SQL, you may have to surround it with quotes like #[parametername]#

Job Parameter explanations are all over the documentation.

2. Yes, you can run DataStage jobs in parallel. Different jobs can run simultaneously with ease and the same job can be run concurrently (in different ways in DataStage 4 and DataStage 5). I frequently multistream jobs using Job Parameters and Job Control.

You mention staging the data. You may not need to actually land the data before loading to the warehouse. I have worked with a couple of converts to DataStage who find the need to land data or--particularly loading it to database staging tables--much lower with DataStage jobs.

Hash tables will also speed the lookup process for data from heterogeneous sources.

-----Original Message-----
From: Jason Mulvin [mailto:jmulvin@fuse.net]
Sent: Wednesday, October 03, 2001 8:32 AM
To: datastage-users@oliver.com
Subject: Running data stage jobs in parallel


Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

In DataStage 4, it does mean using one job with parameters, making several copies of the job, and using Job Control to run those copies in parallel.

From a development point of view (ie, time to develop and maintain), it is one job. To make modifications to the job, you change the first job, then recopy it to the others. Job Control makes it easy to build ranges or map to physical data structures.

I have found it very fast and simple to build and process a single job--then create multiple streams.

-----Original Message-----
From: Phil Walker [mailto:philw@gnosys.co.nz]
Sent: Wednesday, October 03, 2001 8:26 PM
To: datastage-users@oliver.com
Subject: RE: Running data stage jobs in parallel


How do you run the same job concurrently in DataStage 4? I would be most interested.

I assume you do not mean copying the job and changing parameter values as then it is no longer the same job?

Phil

-----Original Message-----
From: lamont.lockwood@ascentialsoftware.com
[mailto:lamont.lockwood@ascentialsoftware.com]
Sent: Thursday, October 04, 2001 12:09 PM
To: datastage-users@oliver.com
Subject: RE: Running data stage jobs in parallel


It sounds like two different questions:
1. Can you assign Job Parameters to DataStage jobs?
2. Can you run DataStage jobs in parallel?

1. DataStage supports Job Parameters in many fields throughout DataStage jobs. Some fields allow selection through a button "Add Parameter" and others simply by adding #[parametername]# to a field. Often, you can embed multiple parameters for unusual structures or can name different types. To define a parameter, look under the Edit/Job Properties menu item.

If you add a parameter as a string constant in SQL, you may have to surround it with quotes like #[parametername]#

Job Parameter explanations are all over the documentation.

2. Yes, you can run DataStage jobs in parallel. Different jobs can run simultaneously with ease and the same job can be run concurrently (in different ways in DataStage 4 and DataStage 5). I frequently multistream jobs using Job Parameters and Job Control.

You mention staging the data. You may not need to actually land the data before loading to the warehouse. I have worked with a couple of converts to DataStage who find the need to land data or--particularly loading it to database staging tables--much lower with DataStage jobs.

Hash tables will also speed the lookup process for data from heterogeneous sources.

-----Original Message-----
From: Jason Mulvin [mailto:jmulvin@fuse.net]
Sent: Wednesday, October 03, 2001 8:32 AM
To: datastage-users@oliver.com
Subject: Running data stage jobs in parallel


Hi,

Im fairly new to DataStage, but have worked with Informaticas PowerMart for a couple of years. In PowerMart, I had the ability to run the same mapping (code base) using different parameters (as defined in the session). Is there a way to do the same thing in DataStage? If so, can I run the same DataStage job with different parameters concurrently.

I have an AR/AP warehouse that is pulling from a large number of similar billing systems (13, eventually). What I am looking to do is define staging jobs for each source, and then create common jobs to load the actual warehouse tables (thereby only having to do the lookup logic once). Does anyone out there have experience with this?

thanks in advance,

jason
Locked