Extract from DB, Transform and Load to Another DB

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Extract from DB, Transform and Load to Another DB

Post by VCInDSX »

Hi Gurus,
Need your invaluable input and suggestions.
We have a requirement to extract data from a database (e.g Oracle) and load this into another database (e.g Sybase).
There are a lot of business rules that should be applied to select the data from the source. There are transformation rules that will be applied before data is loaded into the target system.
What is the best way to go about this? Is there an "SQL Business Rule" stage in Server Jobs? Should we implement the extraction rules in SQL and extract the data into files and then load them into Target DB?
Is there another "Pattern" that is recommended for such tasks?

Thanks in advance for your time and help,
-Vicki
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

This question demands an elaborate explanation which might not be possible here. In other words How do I design my ETL? A lot of factors to look at. The complexity of the rules, the data size, the activity load on the database server etc etc. Usually, a simple extract is done and all the transformations are done at the tool level (datastage server). Sometimes folks go for database level rule implementatioin like joins, trim etc.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Start with a plan. Business analysts might refer to this as a "source to target mapping" or a "target from source mapping" document. The latter is easier for an ETL developer to use.

That document becomes your specification. You can then plan the stages and functions you will use to implement a set of ETL streams to effect the desired results.

Then all you have to do is to design and test. Sounds simple, doesn't it? In general, if you have a good specification, it is.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

You probably need to read a good book on this subject if you don't know. There are a few schools of thought and as 2B said it depends on a lot of factors.

Personally as a general rule I like something along the following lines.
  • Extract from source system tables selecting required records. Do not perform data transformation here. The idea is to keep the extract process as short as possible so the impact on the source system is minimal. Write the data to sequential file.

    Load extracted data into Staging database. Once again this should not have (much) data transformation. Having a staging database helps greatly in analysis for design and debugging in later testing. Separating this from the extract process means that if there is a problem loading into the Staging DB for any reason there is no need to hit the source system again.

    Extract from the Staging database and perform data transformation. Either create files here or load directly into target system. This is where all the business rules should be implemented.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whereas another school of thought might suggest text files or Data Sets for the staging areas, as they don't require a database server to be available (and can often be faster).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

...and the first may argue that if all you are after is ultimate speed of processing then Datasets would be the way, but then there is no persistent storage for these datasets so what happens if you need to re-load?

..and without hitting the source system constantly how do you easily test and investigate the source data?......
Regards,

Nick.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Datasets and flat files are copies of the source. So in case of a failure, no need to go to the source again. These data replicas are as presistent as any table. A staging table, technically, is just another database file on another server.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi DSGuru2B, Ray, Nick,
Many Many thanks for your invaluable time and thoughtful suggestions. My apologies for the temporary hiatus. I got pulled :( into a couple of other criticial tasks and also had a 1-week Datastage Training (IBM) :D :D .
Back to DS now and hope to continue my visits, exploration, queries and responses (as much as possible).
I have started working on your practical ideas and will update the forum.

Thanks again,
Cheers!!!
~V
-V
Post Reply