Server-to-Parallel question on aggregate-and-lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Server-to-Parallel question on aggregate-and-lookup

Post by PhilHibbs »

In a Server job, if I want to aggregate a data set and then compare the original data set against the aggregate, I could have a link from a Sequential File going through an Aggregator into a Hashed File, and then have another link from another Sequential File stage that actually points to the same source file, and pull in a reference link from the hashed file to do the look-up.

In the Land of Parallel, what would be the canonical solution to this requirement? Very similar, one job reading the same file twice but with a Lookup Data Set instead of a Hashed File? Two jobs, one loading a Lookup Data Set much like the Hashed File creation part of the Server Job, and then a second job doing the Lookup? Some other solution involving one job that does it all in parallel by some magic?
Phil Hibbs | Capgemini
Technical Consultant
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Re: Server-to-Parallel question on aggregate-and-lookup

Post by kris007 »

You can use a copy stage after your Sequential stage and define two output links from the Copy stage-- one for the lookup and one for the aggregation.
Kris

Where's the "Any" key?-Homer Simpson
creatingfusion
Participant
Posts: 46
Joined: Tue Jul 20, 2010 1:26 pm
Location: USA
Contact:

Re: Server-to-Parallel question on aggregate-and-lookup

Post by creatingfusion »

Adding copy stage and getting two links out of that being appropriate here as mentioned by kris007 and also you need to replace hash file stage by a data set if you want to use the same data again as data dictionary, else you can directly pull up the link from the aggregator to the lookup stage.

Thanks
Abhijit.
Abhijit
IBM Certified Solution Developer Infosphere DataStage
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: Server-to-Parallel question on aggregate-and-lookup

Post by PhilHibbs »

creatingfusion wrote:Adding copy stage and getting two links out of that being appropriate here as mentioned by kris007 and also you need to replace hash file stage by a data set if you want to use the same data again as data dictionary, else you can directly pull up the link from the aggregator to the lookup stage.
Interesting. How does that work? It has to process the entire data set (or at least, the entire subset for any given lookup key) before it can start doing the lookups. Is that just part of the magic of Enterprise Edition, that it knows how to cache the data until the aggregation is done, which Server Jobs can't do?
Phil Hibbs | Capgemini
Technical Consultant
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Re: Server-to-Parallel question on aggregate-and-lookup

Post by priyadarshikunal »

PhilHibbs wrote: Interesting. How does that work? It has to process the entire data set (or at least, the entire subset for any given lookup key) before it can start doing the lookups. Is that just part of the magic of Enterprise Edition, that it knows how to cache the data until the aggregation is done, which Server Jobs can't do?
Yes, you are on right track. Lookup won't process data unless it has fetched all records in reference link.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Actually server jobs can do it, if you specify use of the read cache.

In parallel jobs it's very obvious what's going on if you look at the score. A Lookup stage generates a composite operator containing the two operators LUT_CreateOp (which loads the reference data set into memory and creates an index on the key), and LUT_ProcessOp (which actually performs the lookups).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply