Page 1 of 1

Data Stage Scheduling Issue

Posted: Fri Nov 14, 2003 1:53 pm
by yiminghu
Hi,

I am new to DataStage, and I have a question regarding to DataStage scheduling. Currently, I am participating a project which tries to convert hard-coded ETL to Datastage, but I've had this big issue about the scheduling. In our old system, the scheduler has lots of flexibility. Every time, it starts, the scheduler checks the table in source database to decide which data (all the data are date related ) is available to be processed. Then it inserts those dates into a table called job_process_date_list. In the second step, it fetched the rows from job_process_date_list, then for each date, it executes a bunch of stored procedure with the date as parameter. (The stored procedures name is stored in a table. ) until all the dates in job_process_date_list are processed. I am wondering how to implement such mechanism in DataStage.

Thanks,

Yiming

Posted: Fri Nov 14, 2003 2:06 pm
by kduke
Yiming

I would start with a job to copy the dates to process to a hash file or a sequential file. You need to create a batch job to read these dates and pass them as parameters to a job sequence which passes them to each job that need to run.

Example:

Job1:

select start_date from whatever -> transform -> hash1

Job2:
open "hash1" to hash1 else stop
read StartRec from hash1, "StartDate" then
... start SeqJob1 ...
end

SeqJob1:
... start all jobs passing parameter "StartDate"

Kim.

Kim.

Posted: Fri Nov 14, 2003 2:07 pm
by Peytot
Sure that if you use the option Schedule, you do not have flexibility. Also, You can program in Basic a program into DataStage which do what you want.
You can also use an external tools for developing your schedule and when you have to execute a Job DataStage, you use the command DSJOB.

Pey

Posted: Fri Nov 14, 2003 9:03 pm
by kcbland
Your answer is a simple one:

You will have to build EVERYTHING. There is no inherent capability in this tool or any other tool to do what you want. This is why there are expensive consultants who specialize in this very thing. (Check out the inner circle names on this forum).

For my opinion (everyone groan)

1. Read Kimball, Inmon, TDWI etc for methodologies regarding persistent staging databases. Yours is a typical data warehousing requirement that has been covered in sooooo many books, articals, and classes.
2. Build an audit repository of process metadata that tracks all batch executions, statuses, processing dates, etc. Use this to seed each run that requires knowledge of last successful run and the dates involved.
3. Build a persistent staging database, tied to an intelligent archiving strategy to induct, cleanse, transform, and load.
4. Get an Inner Circle consultant onsite to make the magic happen. (Shameless plug)

Posted: Fri Nov 14, 2003 10:50 pm
by ray.wurlod
In what country are you operating, Yi Ming?

Posted: Mon Nov 17, 2003 7:19 am
by yiminghu
ray.wurlod wrote:In what country are you operating, Yi Ming?
Hi Ray,
I'm in Toronto Canada. I don't know whether there is any datastage branch in Toronto. Bringing in a datastage consultant from the States seems a little pricy.

Yiming

Posted: Mon Nov 17, 2003 2:29 pm
by ray.wurlod
Not as much as bringing me in from Australia! :lol:

However, this is a job that could be done "by remote control". If you were to create a specification of exactly what your control requirements were, a competent consultant could craft the code and ship it back to you, where you could inspect it, learn techniques from it, and modify it at your leisure.