Ds takes fields from incorrect lookup

jasper · Post by **jasper** » Mon Jul 17, 2006 6:05 am

Hi,
we have a job which is doing a lookup to 2 lookupfiles in one stage.
one of the lookups will always have values , other only sometimes. Now: some of the fields in the lookup-files have the same name.

input: field A,B,C
lookup 1 : A(key),D,E (available for all records)
lookup 2 : B(Key),D,F

output should be A,B,C,D(from1),E,F .
Now it seems that DS takes the D-field from the second lookup, because it's NULL when this is missing. Mappnig inside the stage is correct.

Any idea's?

ArndW · Post by **ArndW** » Mon Jul 17, 2006 6:09 am

Jasper,

links are referred to using LinkName.ColumnName; does this mean that you are specifying Lookup1.D but getting the value of Lookup2.D when Lookup1.D is null?

jasper · Post by **jasper** » Mon Jul 17, 2006 6:19 am

yes, indeed.
But I have to do an update to my post, It seems this depends on the sequence of the links. When the lookup1 is defined as the first lookup it runs ok, when lookup1 is defined as second it takes the wrong value.

In the constrains it states to 'continue' when lookup not found .Does this mean continue with next link, or continue with next record?

ArndW · Post by **ArndW** » Mon Jul 17, 2006 6:28 am

The "continue" means to continue working on the current row, using nulls for the values. This does sound like a potential problem that should be reported to IBM via your support provider. I am still surprised that this could happen, since the variables used for these lookup columns in the generated code are different. Do you have 2 reference links going to the same hashed file stage? If so, does the error change if you change the canvas to have 2 distinct hashed file stages? Also, are one or both of the files being loaded to memory?

jasper · Post by **jasper** » Mon Jul 17, 2006 6:32 am

I cannot read your full text. But I can read the part about contact support. Case was opened before I made this post. I'll update here if I hear from IBM.

ray.wurlod · Post by **ray.wurlod** » Mon Jul 17, 2006 6:36 am

It takes both D columns, but the one from the second lookup overrides the one from the first. Put in a Copy stage or a Modify stage on one of the lookup links to change the column names so that they're all distinct.

ArndW · Post by **ArndW** » Mon Jul 17, 2006 6:42 am

I just noticed (when Ray mentioned the modify stage) that we are in the PX world and not in server anymore! Sorry about confusing the response.

jasper · Post by **jasper** » Mon Jul 17, 2006 6:47 am

When I add a copy stage, will it then create a new workfile, or will it be able to work from the file as is? (or does this depend on the flag to force the copy stage or not?).

ray.wurlod · Post by **ray.wurlod** » Mon Jul 17, 2006 6:55 am

Not a new work file, but a new virtual Data Set (in memory).

jasper · Post by **jasper** » Wed Jul 19, 2006 1:03 am

New virtual data set: I think this means first all data has to be processed for the lookup-file, before the real lookup can start. I think I'll go back a step then . the lookupfile is created from one oracle table and then used in multiple jobs. But if it has to be processed again, maybe it's better to go the DB directly in every job.

update about support: they responded with the question how many rows are processed per link. I sencerely hope this is this guys first day on the job, cause if the logic of data processing would depend on the number of records it's processing I'm switching products.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 19, 2006 2:23 am

A virtual Data Set is the structure. Data are buffered through them; it is not necessary to fill it before draining any rows from it.

A Copy stage can be optimized out by the compiler.

I suspect the support analyst was trying to get a feel for whether your partitioning algorithm was distributing rows (keys) in such a way that the stream input key might be on one partition while the lookup key might be on a different partition, thereby rendering successful lookup impossible. If that were the case they would then recommend Entire partitioning on the reference input link, or identical partitioning and sorting by the lookup keys on both inputs.