Normalizing options in Hashed Files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Normalizing options in Hashed Files

Post by rleishman »

In a recent thread viewtopic.php?t=96063, Nag teased us with a primer on using the Normalizer options in a Hashed File stage.

Intrigued, I took a look at the online help and the DS Server Job Developer manual. The information there amounted to "yes, you can normalize" and hinted at the existence of the Type attribute for output columns. The help for the Type attribute just lists its possible values, not their functionality.
Sadly, a search in this forum yielded not a single thread for "normalize", "normalizer", "normalized", or even the Australian "normalise".

Is there a manual in the doco suite that explains it better? Or a URL where it takes you though the full functionality?
Ross Leishman
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Here's a quite answer. The normalize functionality is for using the stage with hash files created using the multi-value capabilities residual in the product engine from its parent version, Universe.

In Universe, of which the DS Engine is a variant, there is the capability to store something called multi-valued data. That multivalued data is like a 3-D table, where you have rows with columns, but columns can have multiple values. In addition, columns can be "associated", where the multiple values in associated columns are correlated. In a normalized database, those associated columns would actually be a child table of normalized rows. In the case of the multi-value model, they are more efficiently stored as something akin to an embedded table at the column level. To make things even more dizzying, you can have subvalues, which is one more level deeper.

The beauty is that multiple tables in a 3NF model could be a single table in a multi-value model. Data storage is tighter, all data is retrieved in a swell foop, and there's no need for indexing and such to optimize relationships between tables because there are none.

Anyway, DataStage UV/ODBC and Hash stages have the ability to read and operate against Universe based files, and can somewhat unnest these associated columns into normalized datastreams. Think of it like cobol copybooks with occurs, you use the CFF stage to flatten out those files. The normalize feature is used to normalize those associated attributes (columns) in Universe files.

One last note, in order to use DS with Universe, you must thoroughly cleanse and configure the data dictionary for each hash file to properly document the associations. This is something that only a Universe and DataStage expert really has the ability to do well. There are some tricks you can do with DS Server by flipping data into multivalue form in order to expedite certain types of processing, I've documented that on this forum for building SCD lookups that are highly performant.

In summary, if you don't know what Universe is, forget everything about the Normalize feature. It's irrelevant to you.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UniVerse actually supports eight levels of nesting. Can you think or even imagine in eight dimensions?

One simplistic explanation I like is that, in these non first normal form databases, you can have a cell with a "data type" of "table".

Unless you've got such nested data, don't worry about it.

If you're really curious, work your way through the UniVerse System Description manual, which can be downloaded from IBM's web site. Sorry, no URL, I'm on a horribly slow dial-up line.

(If you ever need to create multi-valued data to solve an arbitrary vertical pivot problem, post back.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply