Best Practices For ETL Development and Datastage Usage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sunshine
Participant
Posts: 61
Joined: Wed Jan 04, 2006 10:24 am

Best Practices For ETL Development and Datastage Usage

Post by sunshine »

Hi,

can someone explain,
what are the Best Practices For ETL Development and Datastage Usage??


Thanks
sunshine
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Start with naming convetions like link names should be the table or file name. This is so Director shows link names and it makes more sense to monitor a job this way. Stage names should be database names. Transform names should start with t1, t2, t3 and so on. Project names should be DWDEV, DWTEST and DWPROD or subjectDEV, subjectTEST and so on.

Pick some standard and stick to it. Everyone on the team needs to do it the same way.
Mamu Kim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your question takes a book to answer. Consider reading Ralph Kimballs ETL Toolkit for starters. Ultimately, it all depends on budget, time to deliver, and complexity of solution. Best Practices for one company are different than the next. A best recommendation would be to engage a consultant who has a history of successful deployments to help a team get started in the right direction.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I think Ken is correct. A book is a good start especially this book. Mentoring from an experience pro is also good or even better. But if you cannot afford these then take the baby steps I talked about and build on it. Be consistent as a team of developers.

I think outsourcing is a good example of what not to do. They tend to have many developers with nobody enforcing standards, nothing is automated or focused. No automated documentation. Everything is manual because they can bill more. Every ETL changed is loaded into some complex project management application which encourages people not to fix or improve a process. Don't point out flaws because it makes the team look bad. QA is non-existent because of these reasons. Large consultanting firms QA things to death. Also bad. They over inflate the cost of development by over managing therefore over billing. Change slows down to crawl. RAD tools like DataStage are supposed to make change happen faster. That is why they are called Rapid Application Development tools.

In my opionion what is the most I can get done for the least amount of money is what most companies want and need. It still has to work smooth, run smooth, be easy to change or enhance, easy to learn, good documentation. That means documentation that is useful like accurate source to target documents. The most useful documents you can have are source to target documents.

If you personally want to get out of this type of development then you need to go beyond what you are told. Do what is right or best for the customer. Automate the documentation process. Automate the metadata peice even if they tell you not to. Have you ever looked at Ken's documentation or white papers. Very nice. Look at how well Ken and Ray's code looks. Lots of comments. Ray's jobs are the same way. They develop great day to day work habits. It is hard to make them take short cuts.

Not all outsourcing is done poorly or like I describe and not all big consulting firms drag out a contract and over manage. If I was paying the bills then I doubt if I would use either completely. Use under certain circumstances sure but a mix of employees and outside consultants is probably best for most companies.
Mamu Kim
mauherga
Participant
Posts: 31
Joined: Thu Mar 02, 2006 9:47 am
Location: Mexico

Post by mauherga »

Hi all,

I think when some one ask something like that, they wait for practice recommendations, comments of own experience and links, books references. May be I wrong....

In my few experience, I recomend that:
- Read and share documentation thats come with instalation CDs
- Be shure that the requeriments of software/hardware be correct with your DS version
- Try to configure your DS server the best than you can to match with your system (uvconfig, dsenv, etc.)
- Create user roles using a convination of UNIX-DS permissions
- Divide your process in projects that allows the same theme
- Configure each of your projects with properties you need
- Organaize in categories that facility your administration
- Implement backup politics
- Implement maintenance politics
- Be shure that your developers know the stages to explote them correctly

Good luck
maurik
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Maurik

This is great advice. This is practical wisdom to help things run smooth. This is a great starting point. I think the theory in Kimball and Inmon go over the heads of most people. I think start small and build up to the big ideas. I would rather be street smart and book dumb than the other way around. It means that I know how to get things done in a practical way.

As you get more and more experience then make your solutions to be more elogant and complex. EtlStats was built out of a need to do performance tuning. There was a business need for most of the tips and tricks I have posted. They practical solutions to building data warehouses. I doubt if someone can read Kimball and Inmon books and then go implement it. They need a mentor. The need someone to guide them and correct or fine tune their implementations.

I always thought companies would use this site as a way to audit their data warehouses or review their best practices. I think Ray and Ken should be able to look at your best practices remotely and tell you how to improve them. Why not automate QA of naming conventions and simple things like that? I have done this in the past at customer sites. All of us top posters know enough about the DS repository to automate a task like that.

Simple things like using a parameter for each database connection field. Similar things for sequetial and hashed file paths. Use parameters! Currently I am adding parameters to all jobs for these concepts. It is a lot more work after the fact than I imagined but still only a week or two of my time. I have written lots of jobs and routines to help me though. I now have an auto match for job activities in a sequence. Huge time saver. We have close to 100 sequences.

Is your metadata clean and consistent across all jobs? Very important. How many times do you source the same table? If you touch the same data multiple times then you might be able to combine these and reduce your time to load. How many times you land or touch the same data directly impacts your overall load times. If you land your data 6 times and someone reduces this to 3 times then maybe that table gets loaded in half the time. Makes sense, right? Guys like Ray and Ken can figure this out by looking at your DSX files offsite from their homes.

The same goes for documentation. What metadata do you have? Who is the owner or steward of this data? What source to target data do you have? Is it in a table so you can run reports against it? Who owns it? Who is responsible to keep it current? Very important stuff.

If you only have 2 developers then you may only have time to do the important stuff. Which of this is most important to you and your company? What can you live without? Would your company pay for an ETL audit from Ken or Ray if it was done offsite. Would Ray or Ken give you a special rate if they did this offsite in their spare time without a deadline? If you have a small budget then what is the best way to get mentoring with the least amount of money. Small budget requires being more clever.

I am sure the owner of this web site would love to provide this service. Ask for a quote. He has access to a lot of resources to do it. Might be able to create a template for companies which pay for a group membership. I think he should give awards to the best documentation, the best best practice, the best answer to a FAQ, the best solution posted, the best routine posted. IBM will always give awards to the customer who spends the most money with them. What a joke. IBM built your data warehouse so naturally they think it is great. Is it as good as the one Ray built or Ken or Hester or Craig. Unless we get an unbiased opinion how do we know. We want to see examples of why this solution is great. We want to learn and hopefully implement better solutions too. That is why we read and post to this forum. How can this forum improve too? I hope at some point we have downloadable examples of each stage type to show you how people used it.

I rant. :oops:

The topic was about Best Practices which means I want to do things better. I hope this helps promote doing things better and awareness of all our limitations. It is hard to go beyond where you are now without exposing your limitations and where you are currently. Don't be afraid to show your ignorance. Ignorant and stupid are two separate things.

I hope someone does not quote the 4 phases of learning. Don't want that conversation again.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I have finally gotten around to documenting the standards that I follow should I be working at a site that does not have their own. They may be found on my web site and may be used and adapted freely. I would prefer that some attribution be made if they are.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply