Hi everyone
We recently upgraded from version 7.1 to 7.5.1 and have experienced some strange behaviour from hashed files that used to be fine.
We have jobs that read in from a source and load data into a 32bit hashed file that work fine in version 7.1.
The same job, with the same amount of data is now failing, with a message in log that size exceeds 2 gigs and therefore can't load 32 bit hashed file. This in version 7.5.1
It seems that 7.5.1 is doing something to hashed files that is causing them to use more space. These are static hashed files that have been tuned for best performance so I don't think that it is an issue with the type, unless there are differences between the two versions' static hashed file types.
Has anyone come across this sort of thing?
Thanks
Tania
Hashed files size in 7.1 vs 7.5.1
Moderators: chulett, rschirm, roy
Tania,
the hashed file mechanism itself hasn't changed between versions, so any change in sizes is not due to internal structure modifications.
Have you looked at the hashed file contents to see if there are visible differences in data (the one thing that springs to mind is that for some reason strings might now be blank padded that were previously trimmed)?
Could you post the exact error message you are getting at runtime - that might help a bit in determining the possible cause.
the hashed file mechanism itself hasn't changed between versions, so any change in sizes is not due to internal structure modifications.
Have you looked at the hashed file contents to see if there are visible differences in data (the one thing that springs to mind is that for some reason strings might now be blank padded that were previously trimmed)?
Could you post the exact error message you are getting at runtime - that might help a bit in determining the possible cause.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi Arnd
There have been no changes in the data from the time that this was working and now. Visually it all looks the same. Possibly there is a little bit more data but it is a tiny amount more.
This is the error message.
Tania
There have been no changes in the data from the time that this was working and now. Visually it all looks the same. Possibly there is a little bit more data but it is a tiny amount more.
This is the error message.
Thanks for looking into this.Card020CreateSnapshotLoadFile..HSH_CARD_CNTRN_FIELDS.LN_HSH_CARD_CNTRN_FIELDS: DSD.UVOpen Creating file "HSH_CARD_CNTRN_FIELDS" as Type 18, Modulo 985339, Separation 11.
mkdbfile: unable to create a 32-bit file greater than 2 gigabytes.
Unable to create operating system file "HSH_CARD_CNTRN_FIELDS".
Tania
The error message is when creating the file, not when writing to it. You most likely have a hashed file stage where you delete & create the file each run. A modulo of 985,339 and separtion of 11 will generate a file that needs more than the 32bit pointer to address; in fact I went to a DataStage 7.1 installation and tried the mkdbfile command and got the same error message - so it isn't related to the 7.1 to 7.5 difference. Most likely the file was created manually with a smaller modulo at 7.1 and then this part of the code was never executed; when you went to 7.5 the file was probably not copied over so the job hit this part of code and thus issues the error.
You will need to reduce your static file sizings for this file or manually create it as a 64Bit file.
You will need to reduce your static file sizings for this file or manually create it as a 64Bit file.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks Arnd,
We exported the jobs from 7.1 and imported them into 7.5.1. Yes the hash file is deleted and created in each run as it was in 7.1.
I think what must have happened is that the modulo and seperation were changed around the same time as the upgrade because I'm certain that the file wasn't created manually before.
I'll investigate(fine tune) further and report back findings
Thank you again
Tania
We exported the jobs from 7.1 and imported them into 7.5.1. Yes the hash file is deleted and created in each run as it was in 7.1.
I think what must have happened is that the modulo and seperation were changed around the same time as the upgrade because I'm certain that the file wasn't created manually before.
I'll investigate(fine tune) further and report back findings
Thank you again
Tania
Tania,
if you use that option, all hashed files that are created will be 64BIT. This is not necessarily wise, as the performance is somewhat slower due to overheads and the storage space required also increases. I wouldn't recommend doing this unless all of your hashed reference files are over 2Gb (remember, creating a new job also creates 3 hashed files that are extremely unlikely to ever get that big).
if you use that option, all hashed files that are created will be 64BIT. This is not necessarily wise, as the performance is somewhat slower due to overheads and the storage space required also increases. I wouldn't recommend doing this unless all of your hashed reference files are over 2Gb (remember, creating a new job also creates 3 hashed files that are extremely unlikely to ever get that big).
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
And when Arnd says 'all hashed files' he means literally all hashed files - not just the ones that are used in your jobs or ones that 'need' it. That means all internal hashed files created by the engine in every project as well. These are the three that Arnd mentions.
Not generally something you want to turn on.
Not generally something you want to turn on.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers