Page 1 of 1

Datastage jobs folder organization

Posted: Thu Jan 14, 2016 8:33 pm
by jreddy
This is more of an architecture question and am seeking different perspectives to see what the more popular choice is in various implementations.
When a project has over 1500+ jobs and has data coming in from 5-10 sources all being transformed (together) to be loaded into an Enterprise Data warehouse, how would you go about organizing those jobs in an intuitive way.

We could potentially name the jobs so you would know which source its coming from and where it is being updated to, but if they are grouped logically it is easier to get to them and moreover this naming standard wont work when multiple sources or multiple targets are processed in that job

With both the initial development perspective and future maintenance perspectives in mind.. design of Job folders is only to help the developer develop quickly and debug quickly in case of production issues/new requirements/impact assessments etc.. - being able to find the job faster is probably a huge criteria... - with this in mind, would you normally group and organize jobs by Sources OR by Targets (which are the data warehouse tables) ?

Thanks in advance !!

Posted: Thu Jan 14, 2016 9:46 pm
by ray.wurlod
We prefer to group by neither source nor target but, instead, by major subject area and then by job type (update, insert, recovery, sequence, etc.) For example, the Allowances subject area will have sub-folders Update Jobs, Insert Jobs, Recovery Jobs, Sequence Jobs.

Posted: Thu Jan 14, 2016 10:39 pm
by kduke
A project is usually one target. Jobs within that project are grouped by source. Usually one source is within one sequence. So the jobs are grouped by sequence. A master sequence may pull from a buncjh of sources but this is the way I do it most of the time.

Posted: Fri Jan 15, 2016 5:49 am
by qt_ky
I like to organize the folders at a high level by execution order (E, T, then L), and at the next level of Extract have folders by source system, and at the next level of Transform, by subject area, if it were a data warehouse, although most of our projects are not data warehouse related.

When the folder names do not cooperate with the sorting we want, then we prefix them with "01 " "02 " and so on.

Posted: Tue Jan 19, 2016 10:25 am
by FranklinE
In financials, I find the best logical grouping to be by account. However, I also needed to adjust that as the project got larger, so my suggestion is to find the best logical starting point. This gave me the most flexibility to create sub-groups.

Low-level design choices -- common attributes like source or destination -- prepared me for finding different logical groupings which, with good documentation, made finding things easier during production incident support situations.

In the end, though, with our batch system being very large, grouping by job run timing helped the most.

In my application environment, naming convention was the most critical design decision. It's the best way to find "groups" of jobs which are most likely to be affected by system issues. For example, each timeframe group has an "on-off" switch, a dummy job which when put on hold makes sure that an earlier incident didn't also cause later jobs to fail.

I came from a mainframe development area, so I found it best to keep to that pattern. Every job entry point -- we schedule using Control-M -- is a job sequence with the same name as the CM job. The CM description points to the folder path in Director. Job sequences and the parallel jobs they invoke have the same folder paths.

Nothing beats clear and concise documentation. Our support people don't need to know more than how to use Director to investigate and triage failed jobs.