Page 1 of 1

Oddity in Hash Files

Posted: Mon Nov 24, 2003 12:11 pm
by JDionne
I am seeing an oddity in hash files that I dont understand. I haev one job that creates the hash files. I am using a directory to recieve the hash files. I then go into another job and try to use the hash files. When I look at the files in the second job the columns are the same but the data values in them have changes. The columns seem to have switched around. Makes joins realy hard heheh
anyone ever seen anything like that?
Jim

Posted: Mon Nov 24, 2003 3:09 pm
by chulett
Are you sure that the same columns in both jobs are checked as 'keys'?

Posted: Mon Nov 24, 2003 3:15 pm
by Peytot
Another possibility : check the field 'Position'. If you have a value, it's perhaps not the good one. So, you can supress this value. Personaly, I already put this field to blanck, like that I suppress a risk.

Pey

The sequence and key matters

Posted: Tue Nov 25, 2003 5:19 am
by sdevashis
Hi,
The sequence in hash files matter a lot. Its a good practice to save the metadata as a hash file is created.

If you have hash file like, (CUST_KEY, CUST_ID, CUST_NAME) and you try to use that as (CUST_KEY, CUST_NAME, CUST_ID) Then you are in problem.

The keys in hash files matter a lot. Lets say (CUST_KEY [PK], CUST_ID, CUST_NAME) and you try to use that as (CUST_KEY, CUST_ID [PK], CUST_NAME) then again you are in trouble as HAsh files know nothing about keys.

I hope things will go fine now. :arrow:

Posted: Tue Nov 25, 2003 10:06 am
by aaronej
I would go with Peytot's suggestion. That pesky position column always causes problems.

Good luck!

Aaron

Posted: Tue Nov 25, 2003 5:28 pm
by ray.wurlod
Well, it may "cause problems", but you have to learn to live with it. That's because "position" is how columns are identified in records in hashed files; by their ordinal position.
The hashed file storage mechanism is precisely a delimited string; all values are stored as text, and the location of a particular field is determined by counting those delimiters. For example, to get the column in position four, DataStage counts past three delimiters, then starts accreting bytes until the fourth delimiter (or end-of-record) is encountered.
Position 0 is the key value. If there are multiple key columns, they each have position 0, and DataStage has the means (within the hashed file's dictionary) to determine which is the first, which is the second, and so on.