Any Good Documentation on Best Practices

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
horserider
Participant
Posts: 71
Joined: Mon Jul 09, 2007 1:12 pm

Any Good Documentation on Best Practices

Post by horserider »

Does anyone has a document that explains some of the best practices

- Design and Implementation of ETL Code
- Naming Standards for ETL Jobs and Sequencers.
- How to design restartible ETL jobs etc.

Any help will be much appreciated.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

One of the difficulties is how the ETL is going to be used. One good practice for ETL in a data warehouse of small volume and large processing window won't apply to a high volume and small processing window. You may have to factor in staff competency and ability to maintain the code. If your source and targets are Oracle your solutions and coding practices are going to be different than "flatfiles/cobol copybooks/every database" into "pick a database" solutions. Some folks use ETL tools as databridging solutions (EII, EAI, etc) so things could be very different, especially when talking about realtime trickling of data.

When I've put out documents in the past I tried to carefully frame the applicability to specific variables. Unfortunately, so many people would say that it doesn't apply to their environment and they would be very correct, ignoring the carefully framed wording.

Can you narrow does your focus and describe your environment and processing needs?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
horserider
Participant
Posts: 71
Joined: Mon Jul 09, 2007 1:12 pm

Post by horserider »

I agree with you 100% that with different Sources and Targets design and implementation Practices may be different. It would have been ideal to come up with 1 document that covers some common scenarios. I am looking into some very basic area like

(1) How many max tranformer to have in 1 job?
(2) How many max lookup stage to use in a job?
(3) If we have to merge two data, when to use merge or join?
(4) When to use dataset as opposed to flat file?
(5) Any guidlines to break multiple sequencers to small chunks that will
work well from a restartibility point of view.
(6) Is checking "Add checkpoint" in sequencers and check "Do not
checkpoint" within Sequencer is enough to control restartibility
or there is a a better way of designing restartable sequencers
that calls small sequencers etc.

If you see the above list I am more interested from a tool specific view.

As far as tuning source and target is concerned that is another area where one has to work with DBA etc to see how well the queries are tuned that will perform well during read and write....and that is little different with different database.

I thought if someone has comeup with some sort of guidlines that covers my above 6 points or/and other similar topics that would have helped us a lot.
sreddy
Participant
Posts: 144
Joined: Sun Oct 21, 2007 9:13 am

Re: Any Good Documentation on Best Practices

Post by sreddy »

Horserider


Your entire questions are reasonable manner

(1) Transformer usage that depends on business logic, In PX we have individual stage for transformations that is why Transformer usage is less.
(2) Lookup stage earlier in server 16 per one transformer, i am not sure.
(3) For Merge, Join and Lookup we have more information Advanced developer guide. The memory usage how we can use.
(4) Datasets are used as temporary storing area. We can put primary constraints; we apply parallel processing that makes performance of job.
In flat file we can not apply.
(5) For restartibility with in the sequence we can do.
(6) Add checkpoint is for set up only for saftypurpose, some times if you are running millions of data. For this Ray has answer many times.

  • Best practice is follow the BRD raise all voluble questions and clarify you. Then only you can understand and preparing mapping/design document.
    Make sure to understand complete transaction flow.
    Ask what are the down streams / upstream in your environment.
    When ever you design a job, please do Unit Test each Stage.

    Naming standards are differing based on environment/Implementer.
horserider wrote:Does anyone has a document that explains some of the best practices

- Design and Implementation of ETL Code
- Naming Standards for ETL Jobs and Sequencers.
- How to design restartible ETL jobs etc.

Any help will be much appreciated.
SReddy
dwpractices@gmail.com
Analyzing Performance
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There are a few interesting threads on restartability in the forum archives. Looking for threads discussing "banding". There are some best practices sessions at the IBM IOD Conference. There are some sample jobs from IBM if you have the MDM Server or the PeopleSoft OEM for DataStage. There are some best practices and sample jobs in the 660 page IBM Redbook on DataStage Flows.
umamahes
Premium Member
Premium Member
Posts: 110
Joined: Tue Jul 04, 2006 9:08 pm

Post by umamahes »

Can you please send us the link to The Big Guide for Deploying IBM Information Server onto a Linux Grid


Thanks
HI
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Whoops! I reviewed the RedBook on my blog but forgot to put in a link to it. Try this link: Deploying a Grid Solution with IBM InfoSphere Information Server. It's a 8.1MB download so if you want an overview of what is in it you can read my review.
patil.bnk
Participant
Posts: 5
Joined: Tue Jun 24, 2008 4:45 am
Location: banglore

Post by patil.bnk »

Thanks and Regards
patil
patil.bnk
Participant
Posts: 5
Joined: Tue Jun 24, 2008 4:45 am
Location: banglore

Post by patil.bnk »

Thanks and Regards
patil
Post Reply