Dynamically (re)setting job parameters at run time

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Dynamically (re)setting job parameters at run time

Post by sbass1 »

Hi All,

I'm new to Datastage, although familiar with other ETL tools. If I post a FAQ, feel free to just point me to a link or give me the correct search string to use in the forum.

Scenario:

Table1 (small, in SQL Server):

Date IsCurrentPeriod
2009-02-11 N
2009-02-12 Y
2009-02-13 N

Table2 (huge, in Sybase)

Date OtherStuff
2009-02-11 blah
2009-02-12 blah
2009-02-13 blah

I need to extract all records in Table2 where Date IsCurrentPeriod. I don't want to use a lookup because the performance would be terrible - all records would be extracted from Sybase, then filtered in the Datastage job. I need to dynamically set the desired date in the SQL passed to Sybase.

So, is there a way to dynamically (re)set a job parameter at run time based on data in a table, such that downstream stages can use that parameter in the generated SQL code?

Thanks,
Scott
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

The short answer (which a Search would have revealed) is that there is no legal way to change the value of a job parameter once that job is running.

The usual "solution" is to carry dummy columns containing the information; this approach is essentially cost-free.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the date column in the Sybase table is indexed, lookup performance might not be as bad as you think.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

ray.wurlod wrote:Welcome aboard.

The short answer (which a Search would have revealed) is that there is no legal way to change the value of a job parameter once that job is running.

The usual "solution" is to carry dummy columns containing the information; this approach is essentially cost-free.
Hi Ray,

Thanks for the reply. I did search "job parameter" before posting. Read a few of the 13 hits, and skimmed all subject lines. But I digress :)

Not sure what you meant by "carry dummy columns"? One thing I did think of was writing the result set from SQL Server out to Sybase, then using an inner join to do the subsetting, rather than a hard coded parameter. However, we can't create tables "willy nilly" in our production Sybase environment, and temp tables aren't available across database connections.

It would be nice if datastage 1) had the concept of dynamic parameters that could be set at run time, then used downstream, and 2) some sort of database connection pooling such that temp tables were available for the life of the job.

We're currently investigating a job sequence job and a bunch of basic code to set the parameter for the downstream job. If there's a better approach, please let me know.

Regards,
Scott
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Dynamically (re)setting job parameters at run time

Post by sbass1 »

Hi,

1. Can a routine query a database and return a single row/single column result set to a parameter? If so, does anyone have example code that does this?

2. Can a routine query a database and return a single row/multiple column result set to multiple parameters? If so, does anyone have example code that does this?

3. Can a routine query a database and return a multiple row/single column result set to multiple parameters? If so, does anyone have example code that does this?

4. Finally, can a routine query a database and return a multiple row/multiple column result set and load that into a two dimensional array? If so, does anyone have example code that does this?

The purpose of all of the above is to create a Job Sequence that sets dynamic parameters, then the job control code sets multiple parameters via DSSetParam and runs the job via DSRunJob.

Thanks a lot,
Scott
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

No need to over complicate things with a bunch of hand-coded job control. Use a DataStage job to read your parameter values out of the database, and write them in the form of a delimited string to the job's UserStatus. Use the field function to extract values from the UserStatus to set your downstream job's parameters. Exact search on UserStatus should turn up quite a bit of discussion on the technique.

Mike
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Select list of required dates from the table and store in a parameter.

Use the result in an "IN" clause of second query.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Re: Dynamically (re)setting job parameters at run time

Post by sbass1 »

sbass1 wrote:Hi,

1. Can a routine query a database and return a single row/single column result set to a parameter? If so, does anyone have example code that does this?

2. Can a routine query a database and return a single row/multiple column result set to multiple parameters? If so, does anyone have example code that does this?

3. Can a routine query a database and return a multiple row/single column result set to multiple parameters? If so, does anyone have example code that does this?

4. Finally, can a routine query a database and return a multiple row/multiple column result set and load that into a two dimensional array? If so, does anyone have example code that does this?

The purpose of all of the above is to create a Job Sequence that sets dynamic parameters, then the job control code sets multiple parameters via DSSetParam and runs the job via DSRunJob.

Thanks a lot,
Scott
Mike, thanks for the tip. I searched on UserStatus and have the process working. viewtopic.php?t=117566&highlight=UserStatus was particularly useful.

Sainath, more details please :) You're being very presumptive about my expertise with DS. My understanding is that job parameters are immutable once the job has started. So please explain further what you mean by
Select list of required dates from the table and store in a parameter
and
Use the result in an "IN" clause of second query.
I certainly understand the concept, and it is a piece of cake to do in SAS, it's just coaxing DS to do this that's the issue given my limited knowledge.

Out of curiousity, I'd still be interested if the above can be done for possible use on future projects.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Hi,

I received a PM requesting more detail on how I got this to work (with the help of this list, of course). I thought it would be useful to disseminate this to the forum in case it gets a search hit in the future.

I created the below as a test case:

1. I created the function UserStatus and saved it in Routines/COMMON. The code is:

Call DSSetUserStatus(Arg1)
Ans = Arg1

2. I created a job GetParameters, which does a query against an SQL Server table. The SQL is:

SELECT
MonthlyCalendar.CalYearCode, MonthlyCalendar.CalYearMonthCode
FROM MonthlyCalendar
WHERE IsCurrentReportingPeriod=1;

We need to return the current Year and Month from a metadata table, and pass this into the query for a second table. It returns a single row.

In the Transform for that job, I have a stage variable:

UserStatus(Current_Month.CalYearCode:"#": Current_Month.CalMonthCode)

In that job, I write to a sequential file with the column Dummy = "". I'd prefer to just run the transformer without an output, but DS doesn't seem to allow this. The ouptut file is irrelevant, but AFAIK needed for the job to compile.

3. I created a second job UseParameters, which has the job parameters Year and Month. The SQL query for that job is:

SELECT blah
FROM DailyCalendar
WHERE Year = #Year# and Month = #Month#;

4. I created a Job Sequence job like so:

GetParameters --> Return_User_Status (linkname) --> UseParameters

5. In the Job Parameters window for UseParameters (double click that job in the Job Sequence job, then select the Job tab), I have this code:

Year Field(GetParameters.$UserStatus,"#",1)
Month Field(GetParameters.$UserStatus,"#",2)

So use the upstream job name (GetParameters) not link name (Return_User_Status) in the code to parse UserStatus.

Hope this helps,
Scott
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd be wary of using "#" as a delimiter character, as it can have another meaning (parameter reference). I'd prefer one of the dynamic array delimiter characters, such as field mark (@FM), since these are unlikely ever to occur in data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply