HASH Function?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bbobpop1
Participant
Posts: 29
Joined: Sun Jul 20, 2008 9:55 am

HASH Function?

Post by bbobpop1 »

Hi,

Can you please let me know how to use the HASH function / HASH algoritham in DataStage
Parallel?

Requirement: Pass 10 digit string to the function and the function will generate the
UNIQUE number based on the input string.

Example :
Input Data Expected Output
ABCD 1512
PQRS 7894
ABCD 1512
QWER 4597

The output number should be four digit number ALWAYS.

Number of input rows are less than 9999.

If not HASH is there any another method to suffies this requirement.

I appreciate your help.

thanks and regards
bob
Thanks
bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no HASH function available. You can write your own, either as a parallel routine or as a Build stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sampitke1
Participant
Posts: 24
Joined: Mon Apr 28, 2008 4:22 pm

Post by sampitke1 »

Is there any other function (like checksum) or logic to suffies this requirement?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

CRC32 will generate a 'unique' value based on the input string, but not as a four digit number ALWAYS. An MD5 encryption generates a 32 digit hex number. I'd guess you'd need to roll your own.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Your target field has 10000 possible unique combinations, your source field has 531441 unique combinations. There is no way you can turn four alphas with 27 possible values each into 4 numbers with 10 possible values each without losing uniqueness. You could generate a surrogate key using the surrogate key stage so that any new input data gets the next surrogate key value in the list. This will work until you hit surrogate key number 9999.
bbobpop1
Participant
Posts: 29
Joined: Sun Jul 20, 2008 9:55 am

Post by bbobpop1 »

Hi VMC,

The option that you provided sounds good.

Thanks
bob
Thanks
bob
Post Reply