Incremental Loading based on dynamic date ranges

johm73 · Post by **johm73** » Tue Jan 18, 2005 10:25 am

Here is the deal:

I have several large tables I want to pull from. All these tables have a last_updt_dt field on them so I can determine when a record was last changed. So I could run the following SQL on the source:

Select *
From table_name
Where last_updt_dt > #ExtractLowDt#
And last_updt_dt <= #ExtractHighDt#

Simple enough.

I want to set the 2 variables listed above to dates I pull from a "ExecutionStatus" table I populate with a stored proc on Oracle at the beginning of each job. I have this proc call working fine by using a Before-job routine that calls ExecSH. Not questions there.

Now that I have my date range values in my "ExecutionStatus" table, I simply want to query that table for the current job I'm running and have the resulting date = a variable name :!:

This is a simple process in many other ETL tools I've used. However, it doesn't seem very easy with DataStage.

Can someone please help me out here. I've tried the following:

:arrow: Created functions on the Oracle schema to try and run "in-line" with the SQL generated in my OCI stage. This doesn't work for me becuase my source is not on the same schema and therefore doesn't have access to the function. The client is not open to having the functions created on the source schema.

:arrow: Created a routine that executes a query to return the date from the above mentioned table. I'm unable to run this routine within SQL generated in the OCI stage; so this won't work.

:arrow: Tried using the above mentioned routine to set job parameters; unable to execute routines within the "Default value" section of the parameter.

:cry: Please help! This should not be this difficult. I'm looking for something like this:

ExtractLow = SQL('DB_Name', 'Select extract_dt from executionstatus_table');

Thanks!

Jeff

chucksmith · Post by **chucksmith** » Tue Jan 18, 2005 1:35 pm

Are you askiing for this:

Code: Select all

select * from TableName tn, ExecutionStatus es 
where tn.last_updt_dt > es.ExtractLowDt and 
tn.last_updt_dt <= es.ExtractHighDt and
es.PrimaryKey = "TableName"

What are the columns in the ExecutionStatus tabel?

johm73 · Post by **johm73** » Tue Jan 18, 2005 2:02 pm

I'm unable to do this because my source table and the ExecutionStatus table are on different databases and I'm unable to use DBLinks.

[quote="chucksmith"]Are you askiing for this:

[code]select * from TableName tn, ExecutionStatus es
where tn.last_updt_dt > es.ExtractLowDt and
tn.last_updt_dt <= es.ExtractHighDt and
es.PrimaryKey = "TableName"[/code]

What are the columns in the ExecutionStatus tabel?[/quote]

chucksmith · Post by **chucksmith** » Tue Jan 18, 2005 2:42 pm

Alright, here is the two part quick and dirty solution:

1) Create a little job to select the ExecutionStatus table and write the results to a sequential or hash file. Do this because the results will be easier to access from a routine.

2) Create a routine to read the sequential or hash file and return the resulting row. Call this routine from a RoutineActivity stage in a sequencer, and pass the results as parameters to your real job in a JobActivity stage.

Following is the sequential file example:

Code: Select all

Routine ReadFile(FilePath)

Ans = -1

Openseq FilePath To handle Then
  Readseq OneLine From handle Then
      Ans = OneLine
  End Else
      Ans = -1
  End
End Else
  Ans = -1
End

Return(Ans)

You will probably want to tune the way the variable OneLine is assigned to the variable Ans.

There may also be a way to use DSExecute() to execute your SQL command line interpreter (e.g. SQLPLUS).

johm73 · Post by **johm73** » Tue Jan 18, 2005 3:26 pm

Thanks! Great information! I have another quick question.

If I wanted to use your logic within each job, is there a way to set a variable from the return of a routine? Then use that variable within an SQL statement in an OCI stage?

Examples would help if you have them.

Thanks again!

-------------------------------------------------------------------------------------
[quote="chucksmith"]Alright, here is the two part quick and dirty solution:

1) Create a little job to select the ExecutionStatus table and write the results to a sequential or hash file. Do this because the results will be easier to access from a routine.

2) Create a routine to read the sequential or hash file and return the resulting row. Call this routine from a RoutineActivity stage in a sequencer, and pass the results as parameters to your real job in a JobActivity stage.

Following is the sequential file example:

[code]
Routine ReadFile(FilePath)

Ans = -1

Openseq FilePath To handle Then
Readseq OneLine From handle Then
Ans = OneLine
End Else
Ans = -1
End
End Else
Ans = -1
End

Return(Ans)[/code]

You will probably want to tune the way the variable OneLine is assigned to the variable Ans.

There may also be a way to use DSExecute() to execute your SQL command line interpreter (e.g. SQLPLUS).[/quote]

chucksmith · Post by **chucksmith** » Tue Jan 18, 2005 4:08 pm

Returned values from routines called by a RoutineActivity stage in a job sequencer are visible to subsequent stages. Try this:

1) Create a job sequencer
2) Link a RoutineActivity stage to a JobActivity stage
3) Open the job tab of the JobActivity stage
4) After selecting a job with parameters, select the Value Experssion for one of the parameters
5) Press the insert parameter button
6) On the External Parameter Helper dialog, select the $ReturnValue from your RoutineActivity stage.

This is how you pass the returned value to your job.

You may want two routine activity stages, one for the low value and another for the high value. Maybe, you will want to put each of these values into a different sequential file.

In your OCI stage, in the where clause, just enclose the job parameter names in # signs, and DataStage will do the substitution.

Code: Select all

where tn.last_updt_dt > #LowDt# and tn.last_updt_dt <= #HighDt#

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jan 19, 2005 3:06 am

I think the simplest solution can be to include a dummy table replica of your ES into the source system and populate this before starting any other job.

Once this is done, all your jobs can use this for referencing in SQL.

wnogalski · Post by **wnogalski** » Wed Jan 19, 2005 4:09 am

chucksmith wrote:
Code: Select all
where tn.last_updt_dt > #LowDt# and tn.last_updt_dt <= #HighDt#

If these are dates don't forget to use the TO_DATE Oracle function (if the column type is DATE in Oracle, too).

StefL · Post by **StefL** » Tue Mar 01, 2005 5:06 am

Just wanted to notify everyone who's answered in this thread that it was precisely what I looked for as well. I had a question written on this topic but I searched the forum before posting it and reading this thread answered my question.
Thanks everyone!

StefL · Post by **StefL** » Tue Mar 08, 2005 8:27 am

With regards to the method mentioned above; it requires an intermediate storing of the parameter value on a file in order to let the Routine Activity read it and make it available to later activities in the sequence.

Would it be much harder to write a routine that instead takes reads the value from the database directly? In my case I look for one specific parameter value which I find using two in parameters to the SQL query. So what I'd like to do is have a routine with two in parameters which returns one value.

DSXchange

Incremental Loading based on dynamic date ranges

Incremental Loading based on dynamic date ranges

No DBLinks

Thanks...but another question

Is it harder to let the routine query the DB directly?