Any Good Documentation on Best Practices

horserider · Post by **horserider** » Tue Jun 24, 2008 8:34 am

Does anyone has a document that explains some of the best practices

- Design and Implementation of ETL Code
- Naming Standards for ETL Jobs and Sequencers.
- How to design restartible ETL jobs etc.

Any help will be much appreciated.

kcbland · Post by **kcbland** » Tue Jun 24, 2008 9:00 am

One of the difficulties is how the ETL is going to be used. One good practice for ETL in a data warehouse of small volume and large processing window won't apply to a high volume and small processing window. You may have to factor in staff competency and ability to maintain the code. If your source and targets are Oracle your solutions and coding practices are going to be different than "flatfiles/cobol copybooks/every database" into "pick a database" solutions. Some folks use ETL tools as databridging solutions (EII, EAI, etc) so things could be very different, especially when talking about realtime trickling of data.

When I've put out documents in the past I tried to carefully frame the applicability to specific variables. Unfortunately, so many people would say that it doesn't apply to their environment and they would be very correct, ignoring the carefully framed wording.

Can you narrow does your focus and describe your environment and processing needs?

horserider · Post by **horserider** » Tue Jun 24, 2008 9:39 am

I agree with you 100% that with different Sources and Targets design and implementation Practices may be different. It would have been ideal to come up with 1 document that covers some common scenarios. I am looking into some very basic area like

(1) How many max tranformer to have in 1 job?
(2) How many max lookup stage to use in a job?
(3) If we have to merge two data, when to use merge or join?
(4) When to use dataset as opposed to flat file?
(5) Any guidlines to break multiple sequencers to small chunks that will
work well from a restartibility point of view.
(6) Is checking "Add checkpoint" in sequencers and check "Do not
checkpoint" within Sequencer is enough to control restartibility
or there is a a better way of designing restartable sequencers
that calls small sequencers etc.

If you see the above list I am more interested from a tool specific view.

As far as tuning source and target is concerned that is another area where one has to work with DBA etc to see how well the queries are tuned that will perform well during read and write....and that is little different with different database.

I thought if someone has comeup with some sort of guidlines that covers my above 6 points or/and other similar topics that would have helped us a lot.

sreddy · Post by **sreddy** » Tue Jun 24, 2008 12:50 pm

Horserider

Your entire questions are reasonable manner

(1) Transformer usage that depends on business logic, In PX we have individual stage for transformations that is why Transformer usage is less.
(2) Lookup stage earlier in server 16 per one transformer, i am not sure.
(3) For Merge, Join and Lookup we have more information Advanced developer guide. The memory usage how we can use.
(4) Datasets are used as temporary storing area. We can put primary constraints; we apply parallel processing that makes performance of job.
In flat file we can not apply.
(5) For restartibility with in the sequence we can do.
(6) Add checkpoint is for set up only for saftypurpose, some times if you are running millions of data. For this Ray has answer many times.

Best practice is follow the BRD raise all voluble questions and clarify you. Then only you can understand and preparing mapping/design document.
Make sure to understand complete transaction flow.
Ask what are the down streams / upstream in your environment.
When ever you design a job, please do Unit Test each Stage.

Naming standards are differing based on environment/Implementer.

horserider wrote:Does anyone has a document that explains some of the best practices

- Design and Implementation of ETL Code
- Naming Standards for ETL Jobs and Sequencers.
- How to design restartible ETL jobs etc.

Any help will be much appreciated.

vmcburney · Post by **vmcburney** » Tue Jun 24, 2008 6:12 pm

There are a few interesting threads on restartability in the forum archives. Looking for threads discussing "banding". There are some best practices sessions at the IBM IOD Conference. There are some sample jobs from IBM if you have the MDM Server or the PeopleSoft OEM for DataStage. There are some best practices and sample jobs in the 660 page IBM Redbook on DataStage Flows.

umamahes · Post by **umamahes** » Tue Jun 24, 2008 9:02 pm

Can you please send us the link to The Big Guide for Deploying IBM Information Server onto a Linux Grid

Thanks

vmcburney · Post by **vmcburney** » Tue Jun 24, 2008 10:07 pm

Whoops! I reviewed the RedBook on my blog but forgot to put in a link to it. Try this link: Deploying a Grid Solution with IBM InfoSphere Information Server. It's a 8.1MB download so if you want an overview of what is in it you can read my review.

patil.bnk · Post by **patil.bnk** » Wed Jun 25, 2008 5:55 am

http://www.redbooks.ibm.com/abstracts/sg247576.html

this is the url

patil.bnk · Post by **patil.bnk** » Wed Jun 25, 2008 6:04 am

http://www.redbooks.ibm.com/abstracts/sg247576.html

this is the url

DSXchange

Any Good Documentation on Best Practices

Any Good Documentation on Best Practices

Re: Any Good Documentation on Best Practices