Page 1 of 1

etl/data mapping

Posted: Thu Aug 11, 2005 10:30 am
by troywinston
Hey I am fairly new to datastage and I am specifically interested in ETL/data migration- data mapping, been doing some research but not getting specific information. I have broken down the transformer into 4 processes, specifically I would like to know as a data mapper using datastage -transformation stage, just in a nutshell what is involved in
1. data validation process
2. data cleaning process
3. data integration process
4. data aggregation process

troy :?

Posted: Thu Aug 11, 2005 4:56 pm
by kduke
Is this homework or a job interview question?

Posted: Thu Aug 11, 2005 11:32 pm
by ray.wurlod
Lots of grey areas and overlaps - one of the dangers of applying labels.
  • Validation is ensuring that values are valid; this might include data type and range checking, might include "exists on file" checking too.

    Cleaning is applying algorithms to data to ensure that they comply with business rules, and might include data type conversions, replacement of missing values, and so on.

    Integration is guaranteeing that data from disparate sources is rendered into a form that allows "apples to be compared to apples", and might include standard units of measurement, currency exchange, and so on.

    Aggregation is any form of summarising data; in general you aren't interested in the individual transactions but, rather, total daily sales.
There's a lot more to ETL than that, but I hope the question is adequately answered. May I refer you to The Data Warehouse Lifecycle Toolkit by Ralph Kimball and others, where you will find the answers you seek.

Posted: Mon Aug 29, 2005 6:22 am
by peternolan9
Troy,
you can get a lot more information from the Ralphs latest book on the ETL toolkit. Unfortunately and general ETL discussion is, by necessity, not tool specific.

Also, if you check out my company web site www.instantbi.com you will see we have released a 'mapping spreadsheet' that will help you understand some basics of mappings from staging area to target......look under downloads->beta software->IDW MG Beta...

There is also a presentation about it as well.....

We have also now released a beta for software that generates ETL directly from the mapping spreadsheet (though not in DataStage).

I believe this invention of being able to do all mappings in a spreadsheet will revolutionise ETL development because we all need to write the mappings somewhere and we usually do that in a spreadsheet...then we code it into a tool like DS.

We are currently doing a large and complex EDW and we are generating all ETL code directly from our mapping spreadsheet.....not bad!!