Ds takes fields from incorrect lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Ds takes fields from incorrect lookup

Post by jasper »

Hi,
we have a job which is doing a lookup to 2 lookupfiles in one stage.
one of the lookups will always have values , other only sometimes. Now: some of the fields in the lookup-files have the same name.

input: field A,B,C
lookup 1 : A(key),D,E (available for all records)
lookup 2 : B(Key),D,F

output should be A,B,C,D(from1),E,F .
Now it seems that DS takes the D-field from the second lookup, because it's NULL when this is missing. Mappnig inside the stage is correct.

Any idea's?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Jasper,

links are referred to using LinkName.ColumnName; does this mean that you are specifying Lookup1.D but getting the value of Lookup2.D when Lookup1.D is null?
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

yes, indeed.
But I have to do an update to my post, It seems this depends on the sequence of the links. When the lookup1 is defined as the first lookup it runs ok, when lookup1 is defined as second it takes the wrong value.


In the constrains it states to 'continue' when lookup not found .Does this mean continue with next link, or continue with next record?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The "continue" means to continue working on the current row, using nulls for the values. This does sound like a potential problem that should be reported to IBM via your support provider. I am still surprised that this could happen, since the variables used for these lookup columns in the generated code are different. Do you have 2 reference links going to the same hashed file stage? If so, does the error change if you change the canvas to have 2 distinct hashed file stages? Also, are one or both of the files being loaded to memory?
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

I cannot read your full text. But I can read the part about contact support. Case was opened before I made this post. I'll update here if I hear from IBM.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It takes both D columns, but the one from the second lookup overrides the one from the first. Put in a Copy stage or a Modify stage on one of the lookup links to change the column names so that they're all distinct.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I just noticed (when Ray mentioned the modify stage) that we are in the PX world and not in server anymore! Sorry about confusing the response.
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

When I add a copy stage, will it then create a new workfile, or will it be able to work from the file as is? (or does this depend on the flag to force the copy stage or not?).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not a new work file, but a new virtual Data Set (in memory).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

New virtual data set: I think this means first all data has to be processed for the lookup-file, before the real lookup can start. I think I'll go back a step then . the lookupfile is created from one oracle table and then used in multiple jobs. But if it has to be processed again, maybe it's better to go the DB directly in every job.



update about support: they responded with the question how many rows are processed per link. I sencerely hope this is this guys first day on the job, cause if the logic of data processing would depend on the number of records it's processing I'm switching products.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A virtual Data Set is the structure. Data are buffered through them; it is not necessary to fill it before draining any rows from it.

A Copy stage can be optimized out by the compiler.

I suspect the support analyst was trying to get a feel for whether your partitioning algorithm was distributing rows (keys) in such a way that the stream input key might be on one partition while the lookup key might be on a different partition, thereby rendering successful lookup impossible. If that were the case they would then recommend Entire partitioning on the reference input link, or identical partitioning and sorting by the lookup keys on both inputs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply