Page 1 of 1

Joining the Hash file

Posted: Wed Feb 20, 2008 12:13 am
by MOHAMMAD.ISSAQ
I have a text file as source and hash file as reference.
The key fields in hash file are k1,k2.
But in source file only one key field is present i,e. k1.

How can i join the hash file with the source file?

Please help me out..

Posted: Wed Feb 20, 2008 12:54 am
by ray.wurlod
You can't, using a Hashed File stage.

And it's "hashed" file, not "hash" file.

Posted: Wed Feb 20, 2008 1:02 am
by MOHAMMAD.ISSAQ
Thanks ray for the correction.

Can you please suggest any other possible way to do it.

Posted: Wed Feb 20, 2008 1:53 am
by ray.wurlod
You can use a UniVerse stage. If the hashed file was created in the account (project) that's fine. If the hashed file was created in a directory you will need to create a pointer in VOC. Search the forum for the SETFILE command and how to use it. The UniVerse stage also supports multi-row return from a lookup - search on-line help or, again, DSXchange for more information.

Posted: Wed Feb 20, 2008 8:13 am
by chulett
Or build a version of the hashed file with only the one key that is appropriate for this task.

Posted: Wed Feb 20, 2008 8:34 am
by kcbland
Your described "join" will create a product, unless you resolve your lookup to 1:1. Is this your desire? If you can create a hashed file that gives a 1:1 result then there's no issue. You need to decide if every source row needs to produce as many output rows as there are matching k1 rows in the reference data set.

The hashed file stage is inappropriate for your needs if you need a product because it must be a 1:1 reference. The UV/ODBC stage is a possibility with its multi-row result capability on reference, but you may be better served doing this inside a database with a work table.

Posted: Wed Feb 20, 2008 9:05 am
by MOHAMMAD.ISSAQ
You're right kcbland, my hash file has only 1:1 reference.
I have two jobs:
1) In job1 one i'm creating a Hashed file with source as a sequential file.
There i have two columns say k1,k2.
To fetch the complete set of records from input i've set them as keys k1,k2.If i set k2 as the key only i'm not able to get the complete set.

2) In job2 i'm using the hashed file as a referenece to the another
source file(.txt).I'm reading the source file as a single output column.
The length of the column i've given around 50000.

In this source file the text from 1285 to 1295 say as 'key' will match with that of k2 of Hashed file.But k1 is not at all present in my source file.

So, how can i join this two files to extract the data..

Posted: Wed Feb 20, 2008 9:10 am
by chulett
Load your input file into a work table in your database of choice, then use sql to create your output. Load both if the hashed file source isn't present there.

Posted: Wed Feb 20, 2008 9:24 am
by kcbland
Re-read the the posts from Ray, Craig and myself. You do not understand the terms 1:1 or the difference between a reference and a join.

A reference returns 0 or 1 rows only. A join can have a multiplying effect. A primary input stream only does reference lookups, unless you're using a stage that can produce a multi-row return set which gives the functionality of a join. A Hashed stage is used for reference only. You need database functionality to get to multi-row return set. The UV/ODBC (Universe) stage gives you database functionality on a hashed file stage but with a performance penalty.

Posted: Thu Feb 21, 2008 12:57 am
by MOHAMMAD.ISSAQ
Craig as you've said to load data into the database i've done for the input source of hash file.But for the source of job2 i'm not able to create a table in oracle 9i of a single column of length around 50000 and datatype is CHAR.

How can i create a table of a single of around 50000 and CHAR datatype?

I've done in other way also:
As i said i've loaded data for the input source of Hashed file.Now my Job1 will be:
Input - .txt file and Ouput - Database lookup.
Job2:
Now here instead of using a hashed lookup in job2 i'm using database lookup.The job is running fine but still i'm not getting all the required rows.
My ouput for Job2 is sequential file.

Posted: Thu Feb 21, 2008 8:04 am
by chulett
MOHAMMAD.ISSAQ wrote:Craig as you've said to load data into the database i've done for the input source of hash file.But for the source of job2 i'm not able to create a table in oracle 9i of a single column of length around 50000 and datatype is CHAR.
Of course not, there's no such thing. That's even too large for a VARCHAR2 which has a 4K limit. You'd need to use a CLOB and all of the problems that come with that.

:? But do you really need to load the whole dang thing? Why not just "the text from 1285 to 1295"?

Posted: Thu Feb 21, 2008 9:32 am
by MOHAMMAD.ISSAQ
I've used LONG and created the table also.Now i have database as a source file and a database lookup as reference.My reference lookup has total 304 rows and i have selected only one key here as k2.But there are so many duplicates key.For e.g:
k1 k2
Text_Id Key_1
Text_Name Key_1
Term Key_1
But my output i'm getting as the first row not 2nd and 3rd.
I didn't put any primary key in the table, but in Datastage i've used k2 as key.

Please help me out...

Posted: Thu Feb 21, 2008 10:01 am
by chulett
:shock: Oh Gawd, not a LONG. Use a CLOB instead, unless your version of Oracle is so old it doesn't support them?

Posted: Thu Feb 21, 2008 10:08 am
by MOHAMMAD.ISSAQ
Thanks chulett

Posted: Thu Feb 21, 2008 10:25 am
by chulett
Again - do you really need to load the entire record into the work table for this join? If yes then carry on, just wanted to make sure.

As to your problem, I'm not understanding your issue so can't give much advice - other than to ask you to try to explain it again. Perhaps you could post the table definitions and your sql? Or perhaps someone else could offer something up?