Partitions, Lookups, and Nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Partitions, Lookups, and Nodes

Post by PhilHibbs »

I've got a Lookup Stage that references a Lookup File Set. I'm getting this error:

Code: Select all

Error: Lookup_33,1: Failed to match node node2 (fastname myserver) for LUT Fileset /dstage/projects/HSI_DEV/TAT/datasets/DST.fs
I have re-compiled and re-run both the job that creates the Lookup File Set and the job that does the Lookup. I have another job that tests the Lookup File Set, and it works perfectly. The job that fails appears to process the first 500 rows and then falls over with this error, but the test job only performs the lookup 26 times - I just extended the input data to 560 rows and it still works though.

The Lookup File Set is created with Partitioning set to Entire.

I raised it with the support team, and they suggested that it might be related to these two warnings that appear earlier in the Job Log:

Code: Select all

Warning: Lookup_33: Input dataset 0 has preserve-partitioning flag set; disabling memory sharing.
Warning: Sequential_File_41: When checking operator: A sequential operator cannot preserve the partitioning
 of the parallel data set on input port 0.
I can believe that the first one maye might be related, but the second is further downstream in the Job.

Any ideas?
Phil Hibbs | Capgemini
Technical Consultant
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

What partition do you have on your LookUp Stage. Because your LFS was created using Entire partitioning, did you use Same partitioning within the LookUp Stage. You can try that if did not already or you can also try setting the Preserve partitioning flag to Clear and run the job.
Kris

Where's the "Any" key?-Homer Simpson
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I can't check right now, but I think it might be that lookup filesets, unlike datasets, cannot be dynamically repartitioned. Is this inded a fileset and can you check via orchadmin whether the fileset partitioning is identical to the runtime partitioning?
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I'm trying to delete this duplicate post. I can't see the option anywhere. Can someone PM me with a howto please?
Last edited by PhilHibbs on Fri Aug 06, 2010 5:01 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:I can't check right now, but I think it might be that lookup filesets, unlike datasets, cannot be dynamically repartitioned. Is this inded a fileset and can you check via orchadmin whether the fileset partitioning is identical to the runtime partitioning?
I thought that the point of using "Entire" partitioning when creating a Lookup File Set was so that it doesn't matter what the partitioning is when you do the look-up against it, since all the partitions in the LFS have all the data.

*Update*: I have got around this now by setting the partitioning in the Lookup File Set creation job to hash on the lookup key, which is the same partitioning as the job that invokes the lookup. I'm still interested in what was going wrong, though, I didn't think that all jobs would have to force the same partitioning as the lookup set, that could cause a lot of re-partitioning in jobs that do several lookuups.
Last edited by PhilHibbs on Wed Aug 04, 2010 6:29 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hmmm... you are most likely correct. But would presuppose that your lookup file set was created with just one partition, is that the case?
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:Hmmm... you are most likely correct. But would presuppose that your lookup file set was created with just one partition, is that the case?
Not sure - maybe. It was created by reading a Seq File into a Transformer with Auto, then writing to the LFS with Entire partitioning.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Try making the lookup file set write stage "sequential" - that forces a 1-node configuration. My guess for your problem is that a 2-node Lookup fileset cannot be converted to a 4-node "entire" lookup fileset; but I'm just make an uneducated guess since I haven't seen that error message before.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I was in the middle of writing a post saying that my lookup is now not working - but I see that it's because the job invoking the lookup is hash-partitioning on a VarChar field containing the date, instead of on an actual Date field, resulting in different partitioning between the Lookup File Set and the Lookup Stage and therefore random intermittent lookup success. Just thought I'd post the reason in case anyone's having similar struggles with lookups and partitions.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:Try making the lookup file set write stage "sequential" - that forces a 1-node configuration. My guess for your problem is that a 2-node Lookup fileset cannot be converted to a 4-node "entire" lookup fileset; but I'm just make an uneducated guess since I haven't seen that error message before.
I tried this, and I'm still getting the same error. I got it working by re-partitioning the stream link into the Lookup Stage, so it appear to me that the only way to make Lookup File Sets work is to re-partition before each look-up. Is that correct? I thought that the default 'Entire' partitioning method on creation of a Lookup File Set was to avoid mismatched partitioning problems.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

A couple of years back I did some performance testing between lookup filesets and dataset and, as a result of that testing, have not used lookup filesets since so I am not the one to answer these specfic detailed questions (unless you opted to try a normal dataset).
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ArndW wrote:A couple of years back I did some performance testing between lookup filesets and dataset and, as a result of that testing, have not used lookup filesets since so I am not the one to answer these specfic detailed questions (unless you opted to try a normal dataset).
I'm tempted to do that. Then, I can just re-partition the small set of look-up data if necessary, rather than re-partitioning potentially huge transactional data sets - it feels very much like the tail is wagging the dog this way.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply