Profile Stage Vs Quality Stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chowdary
Participant
Posts: 38
Joined: Thu Jun 23, 2005 11:25 am

Profile Stage Vs Quality Stage

Post by chowdary »

Hi,


Can any explain me the difference between profile stage and quality stage and what are the scenarios where they are used.....

I would be very thankful for the help.


Thanks
chowdary.
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,

Look into the link for detail information.
http://www.ascential.com/litlib/

Profile stage is the acquired tool from MetaRecon.
It's a data mining and analysis tool.

ProfileStage is more related to the discovery of the data,
while AuditStage gives information on the patterns present in the data

Audit stage is the older Quality Manager

These tools gives the means to do data profiling

Thanks
Ketfos
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

There is a lot of information available in this forum as well as ascential site. Bascically QualityStage would be used as a cleansing tool to investigate and standardise the data quality. Among many other features widely used it may also be used for deduplication. And as ketfos has mentined Profile stage is basically a data profiling tool.
Ash.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is substantial overlap between what ProfileStage (formerly MetaRecon) and AuditStage (formerly Quality Manager) do, but enough differences to warrant separate products - though "they" may merge the functionality one day. Both look at the actual data (rather than the metadata) to determine what's really out there, and to look for typical patterns out there (nulls, cardinality, skewed distributions, and so on).

QualityStage (formerly Vality INTEGRITY) is totally different. It performs some or all (your choice) of four separate tasks:
  • investigation (at both character and word level, and free format, which means you can find data that overlap fields or are in the wrong fields)

    standardization, essentially moving data into the correct fields and generating standard forms (for example AV, AVE and AVENUE are all output as AVE (your business rules), also Soundex, NYSIIS and reverse Soundex forms which are better for fuzzy matching)

    matching, which involves identifying potentially duplicate records using probabilistic (rather than deterministic) methods and "blocking" them into groups, assigning match weights and allowing statistical cutoffs to be used to identify true matches, true non-matches and the grey area in between

    survivorship, in which "best of breed" data survive from each block of potential duplicates, for example the most frequently occurring value, the longest string, and so on
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply