In a Server job, if I want to aggregate a data set and then compare the original data set against the aggregate, I could have a link from a Sequential File going through an Aggregator into a Hashed File, and then have another link from another Sequential File stage that actually points to the same source file, and pull in a reference link from the hashed file to do the look-up.
In the Land of Parallel, what would be the canonical solution to this requirement? Very similar, one job reading the same file twice but with a Lookup Data Set instead of a Hashed File? Two jobs, one loading a Lookup Data Set much like the Hashed File creation part of the Server Job, and then a second job doing the Lookup? Some other solution involving one job that does it all in parallel by some magic?
Server-to-Parallel question on aggregate-and-lookup
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Server-to-Parallel question on aggregate-and-lookup
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Re: Server-to-Parallel question on aggregate-and-lookup
You can use a copy stage after your Sequential stage and define two output links from the Copy stage-- one for the lookup and one for the aggregation.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Participant
- Posts: 46
- Joined: Tue Jul 20, 2010 1:26 pm
- Location: USA
- Contact:
Re: Server-to-Parallel question on aggregate-and-lookup
Adding copy stage and getting two links out of that being appropriate here as mentioned by kris007 and also you need to replace hash file stage by a data set if you want to use the same data again as data dictionary, else you can directly pull up the link from the aggregator to the lookup stage.
Thanks
Abhijit.
Thanks
Abhijit.
Abhijit
IBM Certified Solution Developer Infosphere DataStage
IBM Certified Solution Developer Infosphere DataStage
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: Server-to-Parallel question on aggregate-and-lookup
Interesting. How does that work? It has to process the entire data set (or at least, the entire subset for any given lookup key) before it can start doing the lookups. Is that just part of the magic of Enterprise Edition, that it knows how to cache the data until the aggregation is done, which Server Jobs can't do?creatingfusion wrote:Adding copy stage and getting two links out of that being appropriate here as mentioned by kris007 and also you need to replace hash file stage by a data set if you want to use the same data again as data dictionary, else you can directly pull up the link from the aggregator to the lookup stage.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Re: Server-to-Parallel question on aggregate-and-lookup
Yes, you are on right track. Lookup won't process data unless it has fetched all records in reference link.PhilHibbs wrote: Interesting. How does that work? It has to process the entire data set (or at least, the entire subset for any given lookup key) before it can start doing the lookups. Is that just part of the magic of Enterprise Edition, that it knows how to cache the data until the aggregation is done, which Server Jobs can't do?
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.![Wink :wink:](./images/smilies/icon_wink.gif)
Genius may have its limitations, but stupidity is not thus handicapped.
![Wink :wink:](./images/smilies/icon_wink.gif)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Actually server jobs can do it, if you specify use of the read cache.
In parallel jobs it's very obvious what's going on if you look at the score. A Lookup stage generates a composite operator containing the two operators LUT_CreateOp (which loads the reference data set into memory and creates an index on the key), and LUT_ProcessOp (which actually performs the lookups).
In parallel jobs it's very obvious what's going on if you look at the score. A Lookup stage generates a composite operator containing the two operators LUT_CreateOp (which loads the reference data set into memory and creates an index on the key), and LUT_ProcessOp (which actually performs the lookups).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.