Page 1 of 1

Hash lookup in parallel

Posted: Wed Mar 07, 2012 12:15 pm
by nishantrk
Hi,
I have to extract data from a table based on the below sql

select * from xx where col1 = #var1#

The #var1# is a variable to be populated at run time (could be from a seq file or another table)

In server we can write 2 job first to have #var1# to be populated into a hashfile , then do a utilityhashlookup in the second job to use the variable to be put in the where condition.

How to achieve the same in parallel?? fileset Lookup stage will be inefficient
as as I need to select only few records out of millions.
Is there something similar to utility hash lookuo or pass data from one job to another??

Posted: Wed Mar 07, 2012 1:08 pm
by Jboyd
I believe using a sparse lookup would work. You just have to select the value you want from the initial table, then lookup to the table you mentioned. select the lookup option to sparse in the DB connection and in the where clause do where col1 = orchestrate.source_column

Posted: Wed Mar 07, 2012 1:51 pm
by nishantrk
Thanks for your reply...what if the var1 is coming from a seq file or .ds??

Posted: Wed Mar 07, 2012 2:06 pm
by Jboyd
I believe it has to be a table where you can implement a where clause in your SQL, that way you can refer to the variable value coming through

Posted: Wed Mar 07, 2012 2:07 pm
by Jboyd
You could always write that DS or sequential File to a staging table then go from there

Posted: Wed Mar 07, 2012 2:16 pm
by kwwilliams
Couple of ways to do it, you can use a sparse lookup as referred to in the previous postings. Or you can write two jobs, and put them both in a sequence. In the sequence you can cat the file and then pass the return value of the output as a parameter into the second job. If you are going to have more than one value that you need to have in the where clause you would need to go with the sparse lookup

Posted: Wed Mar 07, 2012 3:23 pm
by Kryt0n
Why move to a parallel job if you aren't really gaining anything by moving? Implementing the overhead of parallel processing for a few rows negates the performance improvement running in parallel can give.

Posted: Wed Mar 07, 2012 7:02 pm
by qt_ky
Sparse lookup should work if the reference stage supports sparse lookups, and it should not matter what stage var1 comes from in that case.