Hash File not Available in Transformer

nkln@you · Post by **nkln@you** » Thu May 21, 2009 6:59 pm

I am using a HashFile "Hash_Latest" which has 6 columns. This hash file is populated in one job A and used in another job B . In Job B, this hash file is used as a Look Up.

When I View the data in Hash file by using View "Hash_Latest" data, I could see two records in both Job A and Job B.

I mapped the fileds of the hash file into a sequential file and ran job B
I could not see any records in sequential file. I am wondering why this is happening. Any inputs on this?

chulett · Post by **chulett** » Thu May 21, 2009 7:16 pm

nkln@you wrote:I mapped the fileds of the hash file into a sequential file and ran job B

We'd need a better explanation of this to be able to help.

ray.wurlod · Post by **ray.wurlod** » Thu May 21, 2009 7:42 pm

We need more information about your job design and data.

What do you mean by "mapped"? Can you publish a picture of the Transformer stage editor?

What do you mean by "do not see any records in sequential file"? Are you using View Data or some viewer on the server? Are you indeed looking at the correct file?

And do start using correct terminology. It's "hashed" file.

nkln@you · Post by **nkln@you** » Thu May 21, 2009 9:07 pm

Sorry for wrong terminology.Let me explain the issue clearly.

Actually I am using the Hashed File as lookup in Job B. If lookup satisfies, records would flow from the output link of the transformer. For valid lookup match, there should be records flowing through the output link of the transformer. I am checking whether lookup satisfied or not using NOT(Read_HashLatest.NOTFOUND)

i.e there are 4 key columns in Hashed file and they are used for look up with the columns of the source records inside transformer. For a particular scenario,the values for 4 key columns coming from source and the values for the key columns are same. So the lookup should match and record should flow in the output link of the transofrmer. But I did not get any records from the output link of transformer. When I debugged and check what went wrong, I found 2 records in hashed file using "View Data" option. I tried to populate the values from Hashed file into a sequential using the same Transfoormer.

When I used "view data" option in sequential file, I could not see any records in sequential file.

. So basically why the Sequential file is showing zero records whereas its source "Hashed File" is having records.

Ray, I would not be able upload the screenshot.

Note:While populating Hashed file, it is populated wuth 5 keys, but while reading it is read with 4 keys only. Also the same design worked in a previous project. But a new project is created and this issue happened in new project.

ray.wurlod · Post by **ray.wurlod** » Thu May 21, 2009 9:27 pm

Is any of the key columns VarChar data type and, if so, are they identically trimmed? (Don't assert - check.)

chulett · Post by **chulett** » Thu May 21, 2009 9:28 pm

First off, there's really no need to explain to us how a hashed reference lookup works. Pretty sure we've got that down.

Doesn't matter how many records there are in the hashed file, if the keys don't all match exactly, it's not going to return anything on the reference link. Seems to me your values, source to lookup, are in fact not the same. Typically that means not 'equally trimmed' string fields - for example, "Fred" does not equal "Fred " even though they may look the same in the data viewer. One trick is to click inside the field in the viewer and see if there are any pesky trailing spaces (on either side) causing you grief. Or you may be thinking "007" and "07" are equal when in fact they're not. Hard for us to guess from this side of the glass.

I'm assuming your issue with 'populating the values from hashed file to sequential file' is a victim of the same problem, the failure of the lookups. You can verify all this with the monitor stats - how many rows went into the transformer, how many rows does it show on the reference lookup link and how many rows on the output links? Recognize the fact that the reference link monitor count is the number of successful lookups, not total attempts.

Something else to note - the idea that you can create the hashed file with five keys and do the lookup on only four is invalid. There are ways to do that but they involve other stages accessing the hashed file itself, like the UV stage for example rather than the Hashed File stage. Get this part right first - use the exact same metadata in the read as was used in the write for the hashed file. If you need to lookup with four keys, build it with four keys.

The exception to this is the concept of a 'multi-row result set' but that's a topic for another post... and another stage.

nkln@you · Post by **nkln@you** » Thu May 21, 2009 10:48 pm

I used the same Hash File as Source and the actual source as lookup .In this case lookup is satifying. i,e I could see records in the reference link.

Strange, I am not sure what's happening.

nkln@you · Post by **nkln@you** » Fri May 22, 2009 1:05 pm

While populating Hashed file, it is populated wuth 5 keys, but while reading it is read with 4 keys only.

I got resolution but not convinced with the behaviour of the Hashed File.

While reading from the hashed file, I used all 5 keys, the lookup worked and the records came out from transformer.

But, when I used the same hashed file with 4 Keys and used as source , and actual source as look up, I could see see records in reference link.

My question , why is that when hashed file with 4 keys used as look up, the join did not work, but the same hashed file with 4 keys when used as source, the join worked

chulett · Post by **chulett** » Fri May 22, 2009 1:22 pm

Well... I did tell you it wouldn't work that way. And once again your explanations of what you are doing are hard to follow. Sorry.

What does it mean when you say you used the "actual source as lookup"? If you are trying to compare a hashed file lookup to any other kind, like say a direct database lookup, you cannot. There's no 'join' or 'key matching' going on when you source or read from a hashed file, it simply dumps out all the fields you mention (positionally) in the stage and lookups using other stage types work differently under the covers.

nkln@you · Post by **nkln@you** » Fri May 22, 2009 1:54 pm

Let me explain in other way

I have the hashed file "Hash_Latest" which gets populated using 5 key column and 1 non-key column.

There are 2 jobs

Job 1

The hashed file(with 4 key columns and 1 non-key column) is used as lookup in transformer .
Source(stream) is Oracle Database.

In this case the lookup did not work and retrieved zero rows from reference link

Job 2

The Hashed File(with 4 key columns and 1 non-key column) is used as source (stream) to the transformer.

Oracle database is used as lookup for the transformer.

In this case lookup worked and I could see records in reference links.

The Question is why the hashed file is behaving differently when used as source and when used as lookup.

Hope this time I explained clearly

I tried different ways to explain, but this seems ok.

chulett · Post by **chulett** » Fri May 22, 2009 2:20 pm

OK, got it... and it was pretty much what I imagined you meant, just wanted to be sure. And all of my replies were geared with that in mind.

Reading a hashed file as a source is different from reading one as a reference lookup. As noted, there's no joining or key matching going on in the former, think of it as a simple select with no 'where' clause. Your issue comes into play when the reference lookup adds the 'where' clause to the 'query'... and that's where this analogy starts to fall apart as that 'select' doesn't resolve under the covers anything like the one in your OCI lookup does. And how it all works under the covers is a complex subject worthy of its own class. Matter of fact, that was a great deal of the curriculum many moons ago when I took the Advanced DataStage course.

So, bottom line, it is not behaving 'differently' or 'incorrectly'. You need to understand that a) metadata is King of the hashed file world b) write metadata needs to match read metadata* and c) all key values must match exactly for a reference lookup to succeed.

HTH.

* yes, there are ways to use different metadata on the read than the write but it requires a good understanding of exactly how the positional metadata controls the behaviour of the hashed file to know what's safe and what isn't. And then there's the issue of it confusing whomever comes after you when they think that is the full metadata for the file and change things. Without that understanding, the advice is to always play it safe and use identical metadata, read versus write. Hard to go wrong that way.

chulett · Post by **chulett** » Fri May 22, 2009 3:01 pm

Oh... and as mentioned, there are ways to build a hashed file with X keys and read it back with less than X. You do that to get multiple records back from the lookup and you need to use a UV (or ODBC) stage pointing to the hashed file to do that. And you need a VOC record. And you need to build explicit indexes over the smaller key set or it will be dog slow.

FYI.

DSXchange

Hash File not Available in Transformer

Hash File not Available in Transformer

Re: Hash File not Available in Transformer