Normalizing options in Hashed Files

rleishman · Post by **rleishman** » Sun Oct 23, 2005 7:41 pm

In a recent thread viewtopic.php?t=96063, Nag teased us with a primer on using the Normalizer options in a Hashed File stage.

Intrigued, I took a look at the online help and the DS Server Job Developer manual. The information there amounted to "yes, you can normalize" and hinted at the existence of the Type attribute for output columns. The help for the Type attribute just lists its possible values, not their functionality.
Sadly, a search in this forum yielded not a single thread for "normalize", "normalizer", "normalized", or even the Australian "normalise".

Is there a manual in the doco suite that explains it better? Or a URL where it takes you though the full functionality?

kcbland · Post by **kcbland** » Sun Oct 23, 2005 8:02 pm

Here's a quite answer. The normalize functionality is for using the stage with hash files created using the multi-value capabilities residual in the product engine from its parent version, Universe.

In Universe, of which the DS Engine is a variant, there is the capability to store something called multi-valued data. That multivalued data is like a 3-D table, where you have rows with columns, but columns can have multiple values. In addition, columns can be "associated", where the multiple values in associated columns are correlated. In a normalized database, those associated columns would actually be a child table of normalized rows. In the case of the multi-value model, they are more efficiently stored as something akin to an embedded table at the column level. To make things even more dizzying, you can have subvalues, which is one more level deeper.

The beauty is that multiple tables in a 3NF model could be a single table in a multi-value model. Data storage is tighter, all data is retrieved in a swell foop, and there's no need for indexing and such to optimize relationships between tables because there are none.

Anyway, DataStage UV/ODBC and Hash stages have the ability to read and operate against Universe based files, and can somewhat unnest these associated columns into normalized datastreams. Think of it like cobol copybooks with occurs, you use the CFF stage to flatten out those files. The normalize feature is used to normalize those associated attributes (columns) in Universe files.

One last note, in order to use DS with Universe, you must thoroughly cleanse and configure the data dictionary for each hash file to properly document the associations. This is something that only a Universe and DataStage expert really has the ability to do well. There are some tricks you can do with DS Server by flipping data into multivalue form in order to expedite certain types of processing, I've documented that on this forum for building SCD lookups that are highly performant.

In summary, if you don't know what Universe is, forget everything about the Normalize feature. It's irrelevant to you.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 24, 2005 12:46 am

UniVerse actually supports eight levels of nesting. Can you think or even imagine in eight dimensions?

One simplistic explanation I like is that, in these non first normal form databases, you can have a cell with a "data type" of "table".

Unless you've got such nested data, don't worry about it.

If you're really curious, work your way through the UniVerse System Description manual, which can be downloaded from IBM's web site. Sorry, no URL, I'm on a horribly slow dial-up line.

(If you ever need to create multi-valued data to solve an arbitrary vertical pivot problem, post back.)