Bizzare hashed file lookup issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Bizzare hashed file lookup issue

Post by chulett »

I've never seen behaviour like this before, so I turn to the collective for help as the client no longer has a support agreement. :?

Many records written to a hashed file, all six fields are key fields. I don't think this matters, but for the record this is a pathed hash with a VOC record created for it, more on that later. Here are the four records we're interested in today, nothing changed to protect the innocent:

600000401502,1,DEP,600000587101,1,EMAILM
600000401502,1,DEP,600000587102,1,XLINE
600000401502,1,DEP,600000587099,1,WEBLINK
600000401502,1,DEP,600000587100,1,BDESC


When I do a 'View Data' of the hashed file and 'Find' for the value in the first field, what comes up in the viewer during the first pass are these four records:

600000401502,1,DEP,600000587101,1,EMAILM
600000401502,1,DEP,600000587102,1,XLINE
600000401502,1,DEP,600000587099,1,WEBLINK
600000401502,1,DEP,600000587102,1,XLINE


Note that record 2 is repeated in place of record 4. Click 'Find' one more time, it tells me it found nothing more and would I like to start over from the beginning, I say yes and it then shows me the correct four records:

600000401502,1,DEP,600000587101,1,EMAILM
600000401502,1,DEP,600000587102,1,XLINE
600000401502,1,DEP,600000587099,1,WEBLINK
600000401502,1,DEP,600000587100,1,BDESC


Why in the Blue Blazes does the first pass thru the hashed file return bad data? Every subsequent pass thru it shows correctly until I close and reopen the viewer, at which point the behaviour repeats itself. This problem is also manifesting itself in the job that leverages this hashed file (via a UV stage with a single key field marked so it returns multiple records) the BDESC record is not being returned, but rather the XLINE one is coming down twice instead. Also note this View Data behaviour is consistent regardless of if we check via the hashed file stage or via the UV stage, both exhibit the same troublesome behaviour.

I've run UVFIXFILE on the hashed file and it showed no issues. Any ideas? Things to try? I'm... baffled.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Forgot to mention something - there is an index created over the first column of the hashed file to support the multi-row lookup via the UV stage. I'm wondering if as a first step I should delete and recreate just the index, see if that has any effect on the issue. However... what would the syntax be? We use CREATE.INDEX to create it but I can't find any mention of a corresponding DELETE.INDEX command.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sheesh - just to further complicate this, I ran the job that creates the hashed file in the Dev project, a separate server from the production server. It pulled down the same number or records and exhibits the exact same behaviour when doing View Data. :?

Seems to be telling me it's not specific to the actual dynamic hashed file in production since this one is doing the exact same thing. I don't see how it could be data related as this is pretty vanilla stuff. But I do know it is happening on a daily basis to one degree or another, from checking my reject logs we setup for lookup failures.

Baffled.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a DELETE.INDEX command. Syntax is as you'd expect.

Code: Select all

DELETE.INDEX hashedfilename indexname
You can use ALL in place of indexname.

If it's a bug it's probably in the data browser. Try a query against the hashed file (use either LIST hashedfilename or SELECT * FROM hashedfilename;).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks for the syntax. I don't think the problem is just with the browser as what started this whole witch-hunt was the fact that the job was exhibiting the behaviour as well. Or at least it was telling us a perfectly valid record was not being returned by the lookup and that led us to discover the issue when browsing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then try rebuilding the index.

Code: Select all

BUILD.INDEX hashedfilename
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right, always do. :wink:

Guess this was a red herring. I tried the SELECT you suggested and the proper record values were selected. I also created a small job to send one value to the UV lookup and write the four resulting records to a flat file - they all were selected fine. So the behaviour we're seeing does seem to be a bug in the Data Viewer rather than an issue with the underlying structures.

I guess we're back to the drawing board as to the actual issue we were trying to track down, job-wise. [sigh] I'll mark this as Resolved for the moment.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One says
"" not found
which suggests you're trying to look up a zero-length key. In the other one, it's failing to find a very long key value that ends with Ctrl-M - did it get read from source with UNIX line terminators specified when in fact it should have been read with DOS line terminators?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

Another thing to look for, similar to Ray's last response...
The data used to load the hashed file... Does it contain non-printable characters that are somehow being used as value marks or field marks or whatever Universe uses to denote multivalues in the row (probably not said correctly, but directionally correct)?
We've had this a couple times where the source data in Oracle contained date that Universe/DataStage confused with a value mark (?) and in essence really horked up the hashed file. Worth a look if you haven't gone "upstream" yet. Maybe a paper mill upstream is dumping toxins into your stream of data.
-Craig
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and that looks to have fixed it. We were already stripping CHAR(252) and CHAR(253) from the field as they aborted the hashed loads. Just added 10 and 13 to the list, seemed like the LF was the actual culprit while the CR just became data.

Odd that this didn't turn up earlier as a problem, unless the appearance of those characters in the data is a fairly recent phenomena. Time to go back to a Resolved status, it seems.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply