Page 1 of 2

DataStage best Practices doc

Posted: Sat Feb 10, 2007 3:50 pm
by pradkumar
Hi All

I am looking for a document on Data Stage Best Practices. I am wondering whether it is available in this forum.

Posted: Sat Feb 10, 2007 4:21 pm
by chulett
There isn't one. Except perhaps what some consultants have put together for their clients.

Re: DataStage best Practices doc

Posted: Sat Feb 10, 2007 4:51 pm
by vijayrc
IBM has its own Best Practices for DataStage. I haven't seen one, but heard of it. So you can get in touch with them to send it over. Hope this helps.

Posted: Sat Feb 10, 2007 5:02 pm
by chulett
I believe you'll find there's an IBM 'Best Practices' class they'd love to have you attend. I doubt there is a document they'll just send over, but who knows... can't hurt to ask. :wink:

Posted: Sat Feb 10, 2007 5:31 pm
by DSguru2B
You can get a few tips going through the posts on this forum. I believe chucksmith has a few white papers and other tips on his website.

Posted: Sun Feb 11, 2007 4:51 am
by ray.wurlod
The best practices class is no more; it disappeared before Ascential was acquired.

There is a Best Practices document produced by the IBM Information Integration Solutions Center of Excellence, but this is very strictly for internal IBM use - that is, by their consultants and internal product evaluation, and for parallel jobs only. You will not be able to get a copy of it; it is marked trade secret, confidential and proprietary.

Posted: Sun Feb 11, 2007 7:50 am
by chulett
Heck, probably marked double trade secret and Eyes Only. :wink:

Too bad about the class.

Posted: Sun Feb 11, 2007 10:20 am
by ArndW
And for those who think that this document is desirable because of it's restricted status, please remember that even the U.S. government has kept documents such as "a 1962 telegram from George F. Kennan, then ambassador to Yugoslavia, containing an English translation of a Belgrade newspaper article on China's nuclear weapons program." marked top secret (see N.Y. Times article).
I've seen it and there is no magic contained therein, just a good collection of commonsense practices.

Posted: Sun Feb 11, 2007 11:08 am
by chulett
Ah, and if only common sense were more common... :wink:

And I ain't talking about Thomas Paine.

Posted: Sun Feb 11, 2007 1:55 pm
by ray.wurlod
I neglected to mention that the IBM DataStage Essentials classes have a module entitled Best Practices. It includes things like adopting a naming convention for jobs, stages and links, developing and testing incrementally and things like that. To my mind it is incomplete not just within itself but also because the modules following that one fail to heed the advice given!

Here are my top ten.
  • 1. Be prepared. Use your target-from-source mapping document to be aware of source(s) and target(s) and to import their table definitions before beginning design. Create your own plan, before touching the Designer.

    2. Be prepared. Investigate the contents of your project's Repository so that you can avoid re-inventing anything you don't have to, such as routines, data elements, table definitions, shared containers.

    3. Adopt and adhere to naming conventions for stages and links. If there's an in-house standard, follow that. If not, be consistent, and maybe suggest your conventions to become in-house standards. Don't be too complex; these have failed at some sites because it took longer to figure out what to call the stages than to construct the job!

    4. Develop not just incrementally but also systematically. Test extraction before proceeding: there's no point proceeding unless extraction works. If possible, verify connectivity to target next, for the same reason. Bring the two sets of metadata together in the middle. Use "stub" placeholders, such as Copy or Peek stages (in server jobs, Sequential File appending to /dev/null for UNIX or to .\NUL for Windows).

    5. You are "programming" to communicate with the next developer. Use annotations to alert the next developer to anything unusual in the design, and to provide a summary of execution. Adopt a systematic convention for annotation appearance (font, color, etc.). If you wrap annotations behind stages, make sure that the justification is set so that none of the annotation text is hidden. Always use a description annotation in a standard location (I use 12pt font, top and left justified with no border and transparent background, located at the top of the design adjacent to the job type designator).

    6. Do documentation first. As soon as you create a new job, fill in its short and long description, the latter with the design intention. That means that anyone can take over the design if you can't get back to it. "Later" never happens when it comes to documentation.

    7. Use standard parameter names, with standard prompts, standard help text (never omit the help text) and standard default values always.

    8. Parameter design time defaults never refer to a production system. They either refer to development or are left empty. This avoids accidental manipulation of production data.

    9. Default log purge setting is "up to last run" to keep logs as small as possible. Override where necessary, but reset to this value when finished testing.

    10. Construct jobs that never abort. Construct job sequences that never abort. Retain control at all times.
And, as a bonus:
  • 11. Never do anything unnecessary. Don't process columns you don't have to. Don't process rows you don't need to. Don't execute the same expression more than once. And so on.

Posted: Sun Feb 11, 2007 2:56 pm
by DSguru2B
Ray, if you had sold these ten points, the dsxians would have bought it, easily. Good pointers Ray ji :wink:

Posted: Sun Feb 11, 2007 3:40 pm
by pradkumar
Excellent Post

Thanks ray..Very very Good Points..

Posted: Sun Feb 11, 2007 10:58 pm
by narasimha
Instructive!
Thanks for sharing it - Ray!

Posted: Tue Feb 21, 2012 3:09 am
by karthi_gana
Do documentation first. As soon as you create a new job, fill in its short and long description, the latter with the design intention. That means that anyone can take over the design if you can't get back to it. "Later" never happens when it comes to documentation.
is there any standard template available for this?

Use standard parameter names, with standard prompts, standard help text (never omit the help text) and standard default values always.
where i can find these standards?
Default log purge setting is "up to last run" to keep logs as small as possible. Override where necessary, but reset to this value when finished testing.
i don't understand this point. Where i have to set this?

Posted: Tue Feb 21, 2012 5:50 am
by chandra.shekhar@tcs.com
i don't understand this point. Where i have to set this?
It means that all entries should be deleted apart from those for the last job run.
We can set this in the director client or Administrator client.