Partitions, Lookups, and Nodes

PhilHibbs · Post by **PhilHibbs** » Wed Aug 04, 2010 4:56 am

I've got a Lookup Stage that references a Lookup File Set. I'm getting this error:

Error: Lookup_33,1: Failed to match node node2 (fastname myserver) for LUT Fileset /dstage/projects/HSI_DEV/TAT/datasets/DST.fs

I have re-compiled and re-run both the job that creates the Lookup File Set and the job that does the Lookup. I have another job that tests the Lookup File Set, and it works perfectly. The job that fails appears to process the first 500 rows and then falls over with this error, but the test job only performs the lookup 26 times - I just extended the input data to 560 rows and it still works though.

The Lookup File Set is created with Partitioning set to Entire.

I raised it with the support team, and they suggested that it might be related to these two warnings that appear earlier in the Job Log:

Code: Select all

Warning: Lookup_33: Input dataset 0 has preserve-partitioning flag set; disabling memory sharing.
Warning: Sequential_File_41: When checking operator: A sequential operator cannot preserve the partitioning
 of the parallel data set on input port 0.

I can believe that the first one maye might be related, but the second is further downstream in the Job.

Any ideas?

kris007 · Post by **kris007** » Wed Aug 04, 2010 5:52 am

What partition do you have on your LookUp Stage. Because your LFS was created using Entire partitioning, did you use Same partitioning within the LookUp Stage. You can try that if did not already or you can also try setting the Preserve partitioning flag to Clear and run the job.

ArndW · Post by **ArndW** » Wed Aug 04, 2010 6:08 am

I can't check right now, but I think it might be that lookup filesets, unlike datasets, cannot be dynamically repartitioned. Is this inded a fileset and can you check via orchadmin whether the fileset partitioning is identical to the runtime partitioning?

PhilHibbs · Post by **PhilHibbs** » Wed Aug 04, 2010 6:24 am

I'm trying to delete this duplicate post. I can't see the option anywhere. Can someone PM me with a howto please?

PhilHibbs · Post by **PhilHibbs** » Wed Aug 04, 2010 6:25 am

ArndW wrote:I can't check right now, but I think it might be that lookup filesets, unlike datasets, cannot be dynamically repartitioned. Is this inded a fileset and can you check via orchadmin whether the fileset partitioning is identical to the runtime partitioning?

I thought that the point of using "Entire" partitioning when creating a Lookup File Set was so that it doesn't matter what the partitioning is when you do the look-up against it, since all the partitions in the LFS have all the data.

*Update*: I have got around this now by setting the partitioning in the Lookup File Set creation job to hash on the lookup key, which is the same partitioning as the job that invokes the lookup. I'm still interested in what was going wrong, though, I didn't think that all jobs would have to force the same partitioning as the lookup set, that could cause a lot of re-partitioning in jobs that do several lookuups.

ArndW · Post by **ArndW** » Wed Aug 04, 2010 6:28 am

Hmmm... you are most likely correct. But would presuppose that your lookup file set was created with just one partition, is that the case?

PhilHibbs · Post by **PhilHibbs** » Wed Aug 04, 2010 6:32 am

ArndW wrote:Hmmm... you are most likely correct. But would presuppose that your lookup file set was created with just one partition, is that the case?

Not sure - maybe. It was created by reading a Seq File into a Transformer with Auto, then writing to the LFS with Entire partitioning.

ArndW · Post by **ArndW** » Wed Aug 04, 2010 6:35 am

Try making the lookup file set write stage "sequential" - that forces a 1-node configuration. My guess for your problem is that a 2-node Lookup fileset cannot be converted to a 4-node "entire" lookup fileset; but I'm just make an uneducated guess since I haven't seen that error message before.

PhilHibbs · Post by **PhilHibbs** » Wed Aug 04, 2010 9:17 am

I was in the middle of writing a post saying that my lookup is now not working - but I see that it's because the job invoking the lookup is hash-partitioning on a VarChar field containing the date, instead of on an actual Date field, resulting in different partitioning between the Lookup File Set and the Lookup Stage and therefore random intermittent lookup success. Just thought I'd post the reason in case anyone's having similar struggles with lookups and partitions.

PhilHibbs · Post by **PhilHibbs** » Fri Aug 06, 2010 4:52 am

ArndW wrote:Try making the lookup file set write stage "sequential" - that forces a 1-node configuration. My guess for your problem is that a 2-node Lookup fileset cannot be converted to a 4-node "entire" lookup fileset; but I'm just make an uneducated guess since I haven't seen that error message before.

I tried this, and I'm still getting the same error. I got it working by re-partitioning the stream link into the Lookup Stage, so it appear to me that the only way to make Lookup File Sets work is to re-partition before each look-up. Is that correct? I thought that the default 'Entire' partitioning method on creation of a Lookup File Set was to avoid mismatched partitioning problems.

ArndW · Post by **ArndW** » Fri Aug 06, 2010 6:25 am

A couple of years back I did some performance testing between lookup filesets and dataset and, as a result of that testing, have not used lookup filesets since so I am not the one to answer these specfic detailed questions (unless you opted to try a normal dataset).

PhilHibbs · Post by **PhilHibbs** » Fri Aug 06, 2010 6:28 am

ArndW wrote:A couple of years back I did some performance testing between lookup filesets and dataset and, as a result of that testing, have not used lookup filesets since so I am not the one to answer these specfic detailed questions (unless you opted to try a normal dataset).

I'm tempted to do that. Then, I can just re-partition the small set of look-up data if necessary, rather than re-partitioning potentially huge transactional data sets - it feels very much like the tail is wagging the dog this way.