Dynamically (re)setting job parameters at run time
Moderators: chulett, rschirm, roy
Dynamically (re)setting job parameters at run time
Hi All,
I'm new to Datastage, although familiar with other ETL tools. If I post a FAQ, feel free to just point me to a link or give me the correct search string to use in the forum.
Scenario:
Table1 (small, in SQL Server):
Date IsCurrentPeriod
2009-02-11 N
2009-02-12 Y
2009-02-13 N
Table2 (huge, in Sybase)
Date OtherStuff
2009-02-11 blah
2009-02-12 blah
2009-02-13 blah
I need to extract all records in Table2 where Date IsCurrentPeriod. I don't want to use a lookup because the performance would be terrible - all records would be extracted from Sybase, then filtered in the Datastage job. I need to dynamically set the desired date in the SQL passed to Sybase.
So, is there a way to dynamically (re)set a job parameter at run time based on data in a table, such that downstream stages can use that parameter in the generated SQL code?
Thanks,
Scott
I'm new to Datastage, although familiar with other ETL tools. If I post a FAQ, feel free to just point me to a link or give me the correct search string to use in the forum.
Scenario:
Table1 (small, in SQL Server):
Date IsCurrentPeriod
2009-02-11 N
2009-02-12 Y
2009-02-13 N
Table2 (huge, in Sybase)
Date OtherStuff
2009-02-11 blah
2009-02-12 blah
2009-02-13 blah
I need to extract all records in Table2 where Date IsCurrentPeriod. I don't want to use a lookup because the performance would be terrible - all records would be extracted from Sybase, then filtered in the Datastage job. I need to dynamically set the desired date in the SQL passed to Sybase.
So, is there a way to dynamically (re)set a job parameter at run time based on data in a table, such that downstream stages can use that parameter in the generated SQL code?
Thanks,
Scott
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard.
The short answer (which a Search would have revealed) is that there is no legal way to change the value of a job parameter once that job is running.
The usual "solution" is to carry dummy columns containing the information; this approach is essentially cost-free.
The short answer (which a Search would have revealed) is that there is no legal way to change the value of a job parameter once that job is running.
The usual "solution" is to carry dummy columns containing the information; this approach is essentially cost-free.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi Ray,ray.wurlod wrote:Welcome aboard.
The short answer (which a Search would have revealed) is that there is no legal way to change the value of a job parameter once that job is running.
The usual "solution" is to carry dummy columns containing the information; this approach is essentially cost-free.
Thanks for the reply. I did search "job parameter" before posting. Read a few of the 13 hits, and skimmed all subject lines. But I digress
Not sure what you meant by "carry dummy columns"? One thing I did think of was writing the result set from SQL Server out to Sybase, then using an inner join to do the subsetting, rather than a hard coded parameter. However, we can't create tables "willy nilly" in our production Sybase environment, and temp tables aren't available across database connections.
It would be nice if datastage 1) had the concept of dynamic parameters that could be set at run time, then used downstream, and 2) some sort of database connection pooling such that temp tables were available for the life of the job.
We're currently investigating a job sequence job and a bunch of basic code to set the parameter for the downstream job. If there's a better approach, please let me know.
Regards,
Scott
Dynamically (re)setting job parameters at run time
Hi,
1. Can a routine query a database and return a single row/single column result set to a parameter? If so, does anyone have example code that does this?
2. Can a routine query a database and return a single row/multiple column result set to multiple parameters? If so, does anyone have example code that does this?
3. Can a routine query a database and return a multiple row/single column result set to multiple parameters? If so, does anyone have example code that does this?
4. Finally, can a routine query a database and return a multiple row/multiple column result set and load that into a two dimensional array? If so, does anyone have example code that does this?
The purpose of all of the above is to create a Job Sequence that sets dynamic parameters, then the job control code sets multiple parameters via DSSetParam and runs the job via DSRunJob.
Thanks a lot,
Scott
1. Can a routine query a database and return a single row/single column result set to a parameter? If so, does anyone have example code that does this?
2. Can a routine query a database and return a single row/multiple column result set to multiple parameters? If so, does anyone have example code that does this?
3. Can a routine query a database and return a multiple row/single column result set to multiple parameters? If so, does anyone have example code that does this?
4. Finally, can a routine query a database and return a multiple row/multiple column result set and load that into a two dimensional array? If so, does anyone have example code that does this?
The purpose of all of the above is to create a Job Sequence that sets dynamic parameters, then the job control code sets multiple parameters via DSSetParam and runs the job via DSRunJob.
Thanks a lot,
Scott
No need to over complicate things with a bunch of hand-coded job control. Use a DataStage job to read your parameter values out of the database, and write them in the form of a delimited string to the job's UserStatus. Use the field function to extract values from the UserStatus to set your downstream job's parameters. Exact search on UserStatus should turn up quite a bit of discussion on the technique.
Mike
Mike
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Re: Dynamically (re)setting job parameters at run time
Mike, thanks for the tip. I searched on UserStatus and have the process working. viewtopic.php?t=117566&highlight=UserStatus was particularly useful.sbass1 wrote:Hi,
1. Can a routine query a database and return a single row/single column result set to a parameter? If so, does anyone have example code that does this?
2. Can a routine query a database and return a single row/multiple column result set to multiple parameters? If so, does anyone have example code that does this?
3. Can a routine query a database and return a multiple row/single column result set to multiple parameters? If so, does anyone have example code that does this?
4. Finally, can a routine query a database and return a multiple row/multiple column result set and load that into a two dimensional array? If so, does anyone have example code that does this?
The purpose of all of the above is to create a Job Sequence that sets dynamic parameters, then the job control code sets multiple parameters via DSSetParam and runs the job via DSRunJob.
Thanks a lot,
Scott
Sainath, more details please You're being very presumptive about my expertise with DS. My understanding is that job parameters are immutable once the job has started. So please explain further what you mean by
andSelect list of required dates from the table and store in a parameter
I certainly understand the concept, and it is a piece of cake to do in SAS, it's just coaxing DS to do this that's the issue given my limited knowledge.Use the result in an "IN" clause of second query.
Out of curiousity, I'd still be interested if the above can be done for possible use on future projects.
Hi,
I received a PM requesting more detail on how I got this to work (with the help of this list, of course). I thought it would be useful to disseminate this to the forum in case it gets a search hit in the future.
I created the below as a test case:
1. I created the function UserStatus and saved it in Routines/COMMON. The code is:
Call DSSetUserStatus(Arg1)
Ans = Arg1
2. I created a job GetParameters, which does a query against an SQL Server table. The SQL is:
SELECT
MonthlyCalendar.CalYearCode, MonthlyCalendar.CalYearMonthCode
FROM MonthlyCalendar
WHERE IsCurrentReportingPeriod=1;
We need to return the current Year and Month from a metadata table, and pass this into the query for a second table. It returns a single row.
In the Transform for that job, I have a stage variable:
UserStatus(Current_Month.CalYearCode:"#": Current_Month.CalMonthCode)
In that job, I write to a sequential file with the column Dummy = "". I'd prefer to just run the transformer without an output, but DS doesn't seem to allow this. The ouptut file is irrelevant, but AFAIK needed for the job to compile.
3. I created a second job UseParameters, which has the job parameters Year and Month. The SQL query for that job is:
SELECT blah
FROM DailyCalendar
WHERE Year = #Year# and Month = #Month#;
4. I created a Job Sequence job like so:
GetParameters --> Return_User_Status (linkname) --> UseParameters
5. In the Job Parameters window for UseParameters (double click that job in the Job Sequence job, then select the Job tab), I have this code:
Year Field(GetParameters.$UserStatus,"#",1)
Month Field(GetParameters.$UserStatus,"#",2)
So use the upstream job name (GetParameters) not link name (Return_User_Status) in the code to parse UserStatus.
Hope this helps,
Scott
I received a PM requesting more detail on how I got this to work (with the help of this list, of course). I thought it would be useful to disseminate this to the forum in case it gets a search hit in the future.
I created the below as a test case:
1. I created the function UserStatus and saved it in Routines/COMMON. The code is:
Call DSSetUserStatus(Arg1)
Ans = Arg1
2. I created a job GetParameters, which does a query against an SQL Server table. The SQL is:
SELECT
MonthlyCalendar.CalYearCode, MonthlyCalendar.CalYearMonthCode
FROM MonthlyCalendar
WHERE IsCurrentReportingPeriod=1;
We need to return the current Year and Month from a metadata table, and pass this into the query for a second table. It returns a single row.
In the Transform for that job, I have a stage variable:
UserStatus(Current_Month.CalYearCode:"#": Current_Month.CalMonthCode)
In that job, I write to a sequential file with the column Dummy = "". I'd prefer to just run the transformer without an output, but DS doesn't seem to allow this. The ouptut file is irrelevant, but AFAIK needed for the job to compile.
3. I created a second job UseParameters, which has the job parameters Year and Month. The SQL query for that job is:
SELECT blah
FROM DailyCalendar
WHERE Year = #Year# and Month = #Month#;
4. I created a Job Sequence job like so:
GetParameters --> Return_User_Status (linkname) --> UseParameters
5. In the Job Parameters window for UseParameters (double click that job in the Job Sequence job, then select the Job tab), I have this code:
Year Field(GetParameters.$UserStatus,"#",1)
Month Field(GetParameters.$UserStatus,"#",2)
So use the upstream job name (GetParameters) not link name (Return_User_Status) in the code to parse UserStatus.
Hope this helps,
Scott
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I'd be wary of using "#" as a delimiter character, as it can have another meaning (parameter reference). I'd prefer one of the dynamic array delimiter characters, such as field mark (@FM), since these are unlikely ever to occur in data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.