Hash lookup in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nishantrk
Premium Member
Premium Member
Posts: 23
Joined: Fri May 27, 2011 11:43 am

Hash lookup in parallel

Post by nishantrk »

Hi,
I have to extract data from a table based on the below sql

select * from xx where col1 = #var1#

The #var1# is a variable to be populated at run time (could be from a seq file or another table)

In server we can write 2 job first to have #var1# to be populated into a hashfile , then do a utilityhashlookup in the second job to use the variable to be put in the where condition.

How to achieve the same in parallel?? fileset Lookup stage will be inefficient
as as I need to select only few records out of millions.
Is there something similar to utility hash lookuo or pass data from one job to another??
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

I believe using a sparse lookup would work. You just have to select the value you want from the initial table, then lookup to the table you mentioned. select the lookup option to sparse in the DB connection and in the where clause do where col1 = orchestrate.source_column
nishantrk
Premium Member
Premium Member
Posts: 23
Joined: Fri May 27, 2011 11:43 am

Post by nishantrk »

Thanks for your reply...what if the var1 is coming from a seq file or .ds??
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

I believe it has to be a table where you can implement a where clause in your SQL, that way you can refer to the variable value coming through
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

You could always write that DS or sequential File to a staging table then go from there
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Couple of ways to do it, you can use a sparse lookup as referred to in the previous postings. Or you can write two jobs, and put them both in a sequence. In the sequence you can cat the file and then pass the return value of the output as a parameter into the second job. If you are going to have more than one value that you need to have in the where clause you would need to go with the sparse lookup
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Why move to a parallel job if you aren't really gaining anything by moving? Implementing the overhead of parallel processing for a few rows negates the performance improvement running in parallel can give.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Sparse lookup should work if the reference stage supports sparse lookups, and it should not matter what stage var1 comes from in that case.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply