DataStage best Practices doc

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

DataStage best Practices doc

Post by pradkumar »

Hi All

I am looking for a document on Data Stage Best Practices. I am wondering whether it is available in this forum.
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There isn't one. Except perhaps what some consultants have put together for their clients.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Re: DataStage best Practices doc

Post by vijayrc »

IBM has its own Best Practices for DataStage. I haven't seen one, but heard of it. So you can get in touch with them to send it over. Hope this helps.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I believe you'll find there's an IBM 'Best Practices' class they'd love to have you attend. I doubt there is a document they'll just send over, but who knows... can't hurt to ask. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can get a few tips going through the posts on this forum. I believe chucksmith has a few white papers and other tips on his website.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The best practices class is no more; it disappeared before Ascential was acquired.

There is a Best Practices document produced by the IBM Information Integration Solutions Center of Excellence, but this is very strictly for internal IBM use - that is, by their consultants and internal product evaluation, and for parallel jobs only. You will not be able to get a copy of it; it is marked trade secret, confidential and proprietary.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Heck, probably marked double trade secret and Eyes Only. :wink:

Too bad about the class.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

And for those who think that this document is desirable because of it's restricted status, please remember that even the U.S. government has kept documents such as "a 1962 telegram from George F. Kennan, then ambassador to Yugoslavia, containing an English translation of a Belgrade newspaper article on China's nuclear weapons program." marked top secret (see N.Y. Times article).
I've seen it and there is no magic contained therein, just a good collection of commonsense practices.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah, and if only common sense were more common... :wink:

And I ain't talking about Thomas Paine.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I neglected to mention that the IBM DataStage Essentials classes have a module entitled Best Practices. It includes things like adopting a naming convention for jobs, stages and links, developing and testing incrementally and things like that. To my mind it is incomplete not just within itself but also because the modules following that one fail to heed the advice given!

Here are my top ten.
  • 1. Be prepared. Use your target-from-source mapping document to be aware of source(s) and target(s) and to import their table definitions before beginning design. Create your own plan, before touching the Designer.

    2. Be prepared. Investigate the contents of your project's Repository so that you can avoid re-inventing anything you don't have to, such as routines, data elements, table definitions, shared containers.

    3. Adopt and adhere to naming conventions for stages and links. If there's an in-house standard, follow that. If not, be consistent, and maybe suggest your conventions to become in-house standards. Don't be too complex; these have failed at some sites because it took longer to figure out what to call the stages than to construct the job!

    4. Develop not just incrementally but also systematically. Test extraction before proceeding: there's no point proceeding unless extraction works. If possible, verify connectivity to target next, for the same reason. Bring the two sets of metadata together in the middle. Use "stub" placeholders, such as Copy or Peek stages (in server jobs, Sequential File appending to /dev/null for UNIX or to .\NUL for Windows).

    5. You are "programming" to communicate with the next developer. Use annotations to alert the next developer to anything unusual in the design, and to provide a summary of execution. Adopt a systematic convention for annotation appearance (font, color, etc.). If you wrap annotations behind stages, make sure that the justification is set so that none of the annotation text is hidden. Always use a description annotation in a standard location (I use 12pt font, top and left justified with no border and transparent background, located at the top of the design adjacent to the job type designator).

    6. Do documentation first. As soon as you create a new job, fill in its short and long description, the latter with the design intention. That means that anyone can take over the design if you can't get back to it. "Later" never happens when it comes to documentation.

    7. Use standard parameter names, with standard prompts, standard help text (never omit the help text) and standard default values always.

    8. Parameter design time defaults never refer to a production system. They either refer to development or are left empty. This avoids accidental manipulation of production data.

    9. Default log purge setting is "up to last run" to keep logs as small as possible. Override where necessary, but reset to this value when finished testing.

    10. Construct jobs that never abort. Construct job sequences that never abort. Retain control at all times.
And, as a bonus:
  • 11. Never do anything unnecessary. Don't process columns you don't have to. Don't process rows you don't need to. Don't execute the same expression more than once. And so on.
Last edited by ray.wurlod on Mon Feb 12, 2007 12:14 am, edited 5 times in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ray, if you had sold these ten points, the dsxians would have bought it, easily. Good pointers Ray ji :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Excellent Post

Thanks ray..Very very Good Points..
Pradeep Kumar
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Instructive!
Thanks for sharing it - Ray!
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Post by karthi_gana »

Do documentation first. As soon as you create a new job, fill in its short and long description, the latter with the design intention. That means that anyone can take over the design if you can't get back to it. "Later" never happens when it comes to documentation.
is there any standard template available for this?

Use standard parameter names, with standard prompts, standard help text (never omit the help text) and standard default values always.
where i can find these standards?
Default log purge setting is "up to last run" to keep logs as small as possible. Override where necessary, but reset to this value when finished testing.
i don't understand this point. Where i have to set this?
Karthik
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

i don't understand this point. Where i have to set this?
It means that all entries should be deleted apart from those for the last job run.
We can set this in the director client or Administrator client.
Thanx and Regards,
ETL User
Post Reply