Extract from DB, Transform and Load to Another DB

VCInDSX · Post by **VCInDSX** » Mon Apr 23, 2007 12:57 pm

Hi Gurus,
Need your invaluable input and suggestions.
We have a requirement to extract data from a database (e.g Oracle) and load this into another database (e.g Sybase).
There are a lot of business rules that should be applied to select the data from the source. There are transformation rules that will be applied before data is loaded into the target system.
What is the best way to go about this? Is there an "SQL Business Rule" stage in Server Jobs? Should we implement the extraction rules in SQL and extract the data into files and then load them into Target DB?
Is there another "Pattern" that is recommended for such tasks?

Thanks in advance for your time and help,
-Vicki

DSguru2B · Post by **DSguru2B** » Mon Apr 23, 2007 1:23 pm

This question demands an elaborate explanation which might not be possible here. In other words How do I design my ETL? A lot of factors to look at. The complexity of the rules, the data size, the activity load on the database server etc etc. Usually, a simple extract is done and all the transformations are done at the tool level (datastage server). Sometimes folks go for database level rule implementatioin like joins, trim etc.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 23, 2007 5:03 pm

Start with a plan. Business analysts might refer to this as a "source to target mapping" or a "target from source mapping" document. The latter is easier for an ETL developer to use.

That document becomes your specification. You can then plan the stages and functions you will use to implement a set of ETL streams to effect the desired results.

Then all you have to do is to design and test. Sounds simple, doesn't it? In general, if you have a good specification, it is.

nick.bond · Post by **nick.bond** » Mon Apr 23, 2007 8:29 pm

You probably need to read a good book on this subject if you don't know. There are a few schools of thought and as 2B said it depends on a lot of factors.

Personally as a general rule I like something along the following lines.

Extract from source system tables selecting required records. Do not perform data transformation here. The idea is to keep the extract process as short as possible so the impact on the source system is minimal. Write the data to sequential file.

Load extracted data into Staging database. Once again this should not have (much) data transformation. Having a staging database helps greatly in analysis for design and debugging in later testing. Separating this from the extract process means that if there is a problem loading into the Staging DB for any reason there is no need to hit the source system again.

Extract from the Staging database and perform data transformation. Either create files here or load directly into target system. This is where all the business rules should be implemented.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 24, 2007 12:23 am

Whereas another school of thought might suggest text files or Data Sets for the staging areas, as they don't require a database server to be available (and can often be faster).

nick.bond · Post by **nick.bond** » Tue Apr 24, 2007 2:25 am

...and the first may argue that if all you are after is ultimate speed of processing then Datasets would be the way, but then there is no persistent storage for these datasets so what happens if you need to re-load?

..and without hitting the source system constantly how do you easily test and investigate the source data?......

DSguru2B · Post by **DSguru2B** » Tue Apr 24, 2007 8:22 am

Datasets and flat files are copies of the source. So in case of a failure, no need to go to the source again. These data replicas are as presistent as any table. A staging table, technically, is just another database file on another server.

VCInDSX · Post by **VCInDSX** » Fri May 18, 2007 1:33 pm

Hi DSGuru2B, Ray, Nick,
Many Many thanks for your invaluable time and thoughtful suggestions. My apologies for the temporary hiatus. I got pulled

into a couple of other criticial tasks and also had a 1-week Datastage Training (IBM) :D :D .
Back to DS now and hope to continue my visits, exploration, queries and responses (as much as possible).
I have started working on your practical ideas and will update the forum.

Thanks again,
Cheers!!!
~V