problem in look up when it is long string

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
santoo_happy
Participant
Posts: 9
Joined: Sat Jun 03, 2006 7:06 am

problem in look up when it is long string

Post by santoo_happy »

Hi,

I have a source field "DATA_VALUE" which can have any kind of data i.e. char or number or date.

Iam making a look up with hashed file to get non matching records from source. But look up is NOT matching(even if both records are same) for few records which is having long string approx string length is 2500 characters

Please advice

Thanks,
Santosh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataStage hashed files have a configurable maximum length; I believe the default value is 768; you can check this value by looking into the uvconfig file or executing the command 'smat -t' from the command line. This would mean that the hashed file keys are truncated to this length, which explains why your match didn't work.

I'd have to check my docs to see what the impact of increasing the MAXKEYSIZE parameter in the uvconfig is - offhand I would guess that the overall impact shouldn't be too great.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For a start 2500 would imply GROUP.SIZE 2.

The impact can be huge - can even preclude use of dynamic hashed files!

Be very, very careful.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Abu@0403
Participant
Posts: 32
Joined: Wed Aug 08, 2007 11:21 pm

Re: problem in look up when it is long string

Post by Abu@0403 »

Since the field can contain any of the metadata like char or number or date, I assume that the datatype of that field is given as char.

So one of the solution is to split this single field to multiple field and mention all these splitted fields as key.

In case of splitting also make sure that while splitting some of the data for that field may be very low so in that case the splitted fields may have null, so for these just assign some default value so that exact match occurs when it is compared with that of the input field. Please check if this could solve your problem.
----------------
Abu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Abu - that is a good thought, but unfortunately it doesn't work that way. Multiple key definitions in DataStage actually go into just one key. Hashed files have one and only one key and it must be unique.
Abu@0403
Participant
Posts: 32
Joined: Wed Aug 08, 2007 11:21 pm

Post by Abu@0403 »

Is it that even if we have 10 keys in hash file. It internally combines and has it only as a single fileld. So it will be the same whether it is split or not. Is this the way hash file works.
----------------
Abu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, exactly. There is only one physical unique key; if you specify multiple keys in a job it combines them to one string internally.
Abu@0403
Participant
Posts: 32
Joined: Wed Aug 08, 2007 11:21 pm

Post by Abu@0403 »

Thanks a lot Arnd. Now its clear for me.
----------------
Abu
Abu@0403
Participant
Posts: 32
Joined: Wed Aug 08, 2007 11:21 pm

Post by Abu@0403 »

In this case just split that single field into multiple hashed file fields, say have that splitted and stored into 5 hashed file(500 chars in eash hashed file).

Have a condition like if it matches with all the hashed files then True, even if one of the hash file returns as NOTFOUND, then it would mean like the input field column does not match with the looked up data. Please check if this can be done. Arnd, can you advice if this is feasible.
----------------
Abu
Post Reply