Ignore one or more keys of an hashed file in a lookup

Alethesnake · Post by **Alethesnake** » Wed Jan 23, 2008 9:55 am

Hi all,
I've an hashed file generated by another process. In my job I need to use this hashed file but not all its keys are useful to me.
i.e.
My stream has 2 fields:
- code1
- descriptionField

My reference (the hashed file) has:
- code1 (key)
- code2 (key)
- other column (not a key)

Is it possible to link my stream with the hashed file only by using the key "code1" (the value of code2 of the hashed file is not relevant in this lookup)?
I tried to put 1=1 in the Key Expression of a transformer stage for code2 but It doesn't run.
Thanks a lot for any help.

Ale

chulett · Post by **chulett** » Wed Jan 23, 2008 10:15 am

Not possible. You either need to generate another hashed file with a single key or slap a UV stage over the top of it. Not for the faint hearted, however.

Alethesnake · Post by **Alethesnake** » Wed Jan 23, 2008 10:47 am

Thanks for your answer Craig, I fear it.
I've already thought to the UV stage way ( :D ), but this hashed file has not been created UV compatible, and I can't modify it.
I'll generate a new one.

Bye,

Ale

ray.wurlod · Post by **ray.wurlod** » Wed Jan 23, 2008 7:18 pm

UV compatibility is only about date-type data. Every hashed file can be accessed via a UV stage, though you may need to create a pointer to a directory-based hashed file in the VOC (search the forum for SETFILE).

Alethesnake · Post by **Alethesnake** » Thu Jan 24, 2008 4:49 am

D'oh, I was in fault.
Thanks Ray, UV stage works fine!

Actually I create the VOC pointer in the before-stage subroutine of the transformer where I perform the lookup.
Let's suppose I need to create there a second pointer, is it possible to define both the two pointers into the same value field of the before stage subroutine statement of the transformer or I need a workaround?

Thanks again,

Ale.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 24, 2008 7:07 am

Are you using a custom before-job subroutine or one of the out-of-the-box ones? If the latter you can easily queue as many commands as you like, with @FM between them. If you are using, say, ExecTCL, then you need a workaround. The easiest is to execute each of the commands in turn in Administrator client Command window, then multi-select those commands and save them under a meaningful name. That name is then the command you might execute from ExecTCL.

chulett · Post by **chulett** » Thu Jan 24, 2008 8:20 am

Usually there is more to it than simply putting a UV stage over the hashed file. As noted, you need a VOC entry and for many people that's unknown territory. It can also involve building / maintaining indexes over the remaining keys, otherwise lookup performance can be pitiful.

Glad you got it sorted out, but why do you feel the need to create a 'second pointer'? A little lost there. And in your routine, do you remove the VOC record each time, or OVERWRITE the old one... or?

Alethesnake · Post by **Alethesnake** » Thu Jan 24, 2008 10:46 am

Hi Craig,
you're right, I tried to simplify my question in my first post.
The situation is the following one.

My stream is a sequential file containing some economical data regarding branches or other kind of offices. The fields:
- a code identifying the single branch or office
- other values not relevant for the lookup

Branches and offices are two different entities and each branch (just as each office) is identified by an unique id. Even if these are two different entities, the identifiers have been defined in order to be unique (an intersection between the set of branches and the set of offices on their identifier gives a null set).
Other procedures generate two different hashed files (one for the branches an one for the offices), and both these hashed files have more key columns then those I need.

Branch hashed file:
- branch code (key)
- institute code (key, superfluous in my case)
- surrogate code of the branch in the dwh

Office hashed file:
- office code (key)
- institute code (key, superfluous in my case)
- surrogate code of the office in the dwh

In a transformer stage I need to perform a referential integrity check with both these two hashed files, in order to reject the rows of my sequential file that does not identify neither a branch nor an office.

This is the reason why I need two UV stages instead of one, and I have to define two pointers instead of one.
Actually I define a pointer with SETFILE in the before-job subroutine (an execTCL) and one in the before-stage subroutine of the transformer where I perform the lookup, but I know that it's not very nice to view

. I will try the solution suggested by Ray (using @FM) as soon as our server become available again (it's down for maintenance now

).

Being the format of the fields of the two hashed files the same, an alternative I tried was to perform a SELECT UNION into a single UV stage, but I had several problems with the definition of the where condition.

I tried:

Code: Select all

SELECT code, surrogate_code FROM
(
SELECT branch_code AS code, branch_surrogate_code AS surrogate_code FROM branch_pointer
UNION ALL
SELECT office_code AS code, office_surrogate_code AS surrogate_code FROM office_pointer
) AS T
WHERE code = ?;

and

Code: Select all

SELECT branch_code, branch_surrogate_code FROM branch_pointer WHERE branch_code = ?
UNION ALL
SELECT office_code, office_surrogate_code FROM office_pointer WHERE office_code = ?;

but both failed..

ps: Here I overwrite the pointers before using them each time.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 24, 2008 5:30 pm

What error message do you get when the query "failed"? Just "it failed" does not help anyone to diagnose it. Have you tried

Code: Select all

SELECT branch_code, branch_surrogate_code FROM branch_pointer WHERE branch_code = '? '
UNION ALL 
SELECT office_code, office_surrogate_code FROM office_pointer WHERE office_code = '?';

There are two parameter markers in this query; have you provided two key values to it from your job?

Alethesnake · Post by **Alethesnake** » Fri Jan 25, 2008 4:02 am

The errors I get from the statements I posted are:

SQLSTATE=S0002, DBMS.CODE=950390
[DataStage][SQL Client][UNIVERSE]DataStage/SQL: Table "(" does not exist.

SQLSTATE=S1000, DBMS.CODE=950125
[DataStage][SQL Client][UNIVERSE]DataStage/SQL: QUESTIONMARK not supported in TCL.

This second one puzzles me because if I write only:

Code: Select all

SELECT office_code, office_surrogate_code FROM office_pointer WHERE office_code = ?;

It runs well.
The query in the primary input (with the union) runs well, I can view the data from the two hashed files correctly united. The problem is only in the query for reference, when I specify the where condition..

WoMaWil · Post by **WoMaWil** » Fri Jan 25, 2008 4:46 am

Try

Code: Select all

SELECT code, surrogate_code FROM 
( 
SELECT branch_code AS code, max(branch_surrogate_code) AS surrogate_code FROM branch_pointer 
UNION ALL 
SELECT office_code AS code, max(office_surrogate_code) AS surrogate_code FROM office_pointer 
) AS T 
WHERE code = ?;

Alethesnake · Post by **Alethesnake** » Fri Jan 25, 2008 6:24 am

WoMaWil wrote:Try

Code: Select all

SELECT code, surrogate_code FROM 
( 
SELECT branch_code AS code, max(branch_surrogate_code) AS surrogate_code FROM branch_pointer 
UNION ALL 
SELECT office_code AS code, max(office_surrogate_code) AS surrogate_code FROM office_pointer 
) AS T 
WHERE code = ?;

I get the same error (Table "(" does not exist.)

WoMaWil · Post by **WoMaWil** » Fri Jan 25, 2008 6:58 am

Then it seams that subqueries are not supported. So transfer your source-Data into one new HashFile and use this new file as Lookup.

chulett · Post by **chulett** » Fri Jan 25, 2008 7:51 am

Alethesnake wrote:This second one puzzles me because if I write only:
Code: Select all
SELECT office_code, office_surrogate_code FROM office_pointer WHERE office_code = ?; 
It runs well.

When you have two question marks, do you have two columns marked as Key fields as well?

Alethesnake · Post by **Alethesnake** » Fri Jan 25, 2008 8:33 am

Yes, I do.
I build my statement as follow.

Output Columns:
- OFFICE_CODE , key, not null, integer
- BRANCH_CODE , key, not null, integer
- OFFICE_BRANCH_SURR_CODE, not null, integer

And the reference sql statement is:

Code: Select all

SELECT office_code, NULL as tempField, office_surrogate_code FROM office_pointer WHERE (office_code = ?)
UNION ALL
SELECT branch_code, NULL as tempField, branch_surrogate_code FROM branch_pointer WHERE (branch_code = ?);

@WoMaWil: I can solve also by performing a double lookup in the same transformer without the creation of a new hashed file but I'd like to understand why the union as primary sql runs well and as reference no.