Hashed files size in 7.1 vs 7.5.1

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Tania
Participant
Posts: 22
Joined: Tue Jul 13, 2004 7:54 am
Location: Johannesburg

Hashed files size in 7.1 vs 7.5.1

Post by Tania »

Hi everyone

We recently upgraded from version 7.1 to 7.5.1 and have experienced some strange behaviour from hashed files that used to be fine.

We have jobs that read in from a source and load data into a 32bit hashed file that work fine in version 7.1.
The same job, with the same amount of data is now failing, with a message in log that size exceeds 2 gigs and therefore can't load 32 bit hashed file. This in version 7.5.1

It seems that 7.5.1 is doing something to hashed files that is causing them to use more space. These are static hashed files that have been tuned for best performance so I don't think that it is an issue with the type, unless there are differences between the two versions' static hashed file types.

Has anyone come across this sort of thing?

Thanks
Tania
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Tania,

the hashed file mechanism itself hasn't changed between versions, so any change in sizes is not due to internal structure modifications.

Have you looked at the hashed file contents to see if there are visible differences in data (the one thing that springs to mind is that for some reason strings might now be blank padded that were previously trimmed)?
Could you post the exact error message you are getting at runtime - that might help a bit in determining the possible cause.
Tania
Participant
Posts: 22
Joined: Tue Jul 13, 2004 7:54 am
Location: Johannesburg

Post by Tania »

Hi Arnd

There have been no changes in the data from the time that this was working and now. Visually it all looks the same. Possibly there is a little bit more data but it is a tiny amount more.

This is the error message.
Card020CreateSnapshotLoadFile..HSH_CARD_CNTRN_FIELDS.LN_HSH_CARD_CNTRN_FIELDS: DSD.UVOpen Creating file "HSH_CARD_CNTRN_FIELDS" as Type 18, Modulo 985339, Separation 11.

mkdbfile: unable to create a 32-bit file greater than 2 gigabytes.

Unable to create operating system file "HSH_CARD_CNTRN_FIELDS".
Thanks for looking into this.

Tania
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The error message is when creating the file, not when writing to it. You most likely have a hashed file stage where you delete & create the file each run. A modulo of 985,339 and separtion of 11 will generate a file that needs more than the 32bit pointer to address; in fact I went to a DataStage 7.1 installation and tried the mkdbfile command and got the same error message - so it isn't related to the 7.1 to 7.5 difference. Most likely the file was created manually with a smaller modulo at 7.1 and then this part of the code was never executed; when you went to 7.5 the file was probably not copied over so the job hit this part of code and thus issues the error.

You will need to reduce your static file sizings for this file or manually create it as a 64Bit file.
Tania
Participant
Posts: 22
Joined: Tue Jul 13, 2004 7:54 am
Location: Johannesburg

Post by Tania »

Thanks Arnd,

We exported the jobs from 7.1 and imported them into 7.5.1. Yes the hash file is deleted and created in each run as it was in 7.1.

I think what must have happened is that the modulo and seperation were changed around the same time as the upgrade because I'm certain that the file wasn't created manually before.

I'll investigate(fine tune) further and report back findings

Thank you again
Tania
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Tania,

if you use that option, all hashed files that are created will be 64BIT. This is not necessarily wise, as the performance is somewhat slower due to overheads and the storage space required also increases. I wouldn't recommend doing this unless all of your hashed reference files are over 2Gb (remember, creating a new job also creates 3 hashed files that are extremely unlikely to ever get that big).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And when Arnd says 'all hashed files' he means literally all hashed files - not just the ones that are used in your jobs or ones that 'need' it. That means all internal hashed files created by the engine in every project as well. These are the three that Arnd mentions.

Not generally something you want to turn on.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Tania
Participant
Posts: 22
Joined: Tue Jul 13, 2004 7:54 am
Location: Johannesburg

Post by Tania »

Thanks for the info :D

In that case we'll find another way around these big files.

Thank you again for all of your help.

Tania
Post Reply