Ds takes fields from incorrect lookup
Moderators: chulett, rschirm, roy
Ds takes fields from incorrect lookup
Hi,
we have a job which is doing a lookup to 2 lookupfiles in one stage.
one of the lookups will always have values , other only sometimes. Now: some of the fields in the lookup-files have the same name.
input: field A,B,C
lookup 1 : A(key),D,E (available for all records)
lookup 2 : B(Key),D,F
output should be A,B,C,D(from1),E,F .
Now it seems that DS takes the D-field from the second lookup, because it's NULL when this is missing. Mappnig inside the stage is correct.
Any idea's?
we have a job which is doing a lookup to 2 lookupfiles in one stage.
one of the lookups will always have values , other only sometimes. Now: some of the fields in the lookup-files have the same name.
input: field A,B,C
lookup 1 : A(key),D,E (available for all records)
lookup 2 : B(Key),D,F
output should be A,B,C,D(from1),E,F .
Now it seems that DS takes the D-field from the second lookup, because it's NULL when this is missing. Mappnig inside the stage is correct.
Any idea's?
Jasper,
links are referred to using LinkName.ColumnName; does this mean that you are specifying Lookup1.D but getting the value of Lookup2.D when Lookup1.D is null?
links are referred to using LinkName.ColumnName; does this mean that you are specifying Lookup1.D but getting the value of Lookup2.D when Lookup1.D is null?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
yes, indeed.
But I have to do an update to my post, It seems this depends on the sequence of the links. When the lookup1 is defined as the first lookup it runs ok, when lookup1 is defined as second it takes the wrong value.
In the constrains it states to 'continue' when lookup not found .Does this mean continue with next link, or continue with next record?
But I have to do an update to my post, It seems this depends on the sequence of the links. When the lookup1 is defined as the first lookup it runs ok, when lookup1 is defined as second it takes the wrong value.
In the constrains it states to 'continue' when lookup not found .Does this mean continue with next link, or continue with next record?
The "continue" means to continue working on the current row, using nulls for the values. This does sound like a potential problem that should be reported to IBM via your support provider. I am still surprised that this could happen, since the variables used for these lookup columns in the generated code are different. Do you have 2 reference links going to the same hashed file stage? If so, does the error change if you change the canvas to have 2 distinct hashed file stages? Also, are one or both of the files being loaded to memory?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It takes both D columns, but the one from the second lookup overrides the one from the first. Put in a Copy stage or a Modify stage on one of the lookup links to change the column names so that they're all distinct.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I just noticed (when Ray mentioned the modify stage) that we are in the PX world and not in server anymore! Sorry about confusing the response.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
New virtual data set: I think this means first all data has to be processed for the lookup-file, before the real lookup can start. I think I'll go back a step then . the lookupfile is created from one oracle table and then used in multiple jobs. But if it has to be processed again, maybe it's better to go the DB directly in every job.
update about support: they responded with the question how many rows are processed per link. I sencerely hope this is this guys first day on the job, cause if the logic of data processing would depend on the number of records it's processing I'm switching products.
update about support: they responded with the question how many rows are processed per link. I sencerely hope this is this guys first day on the job, cause if the logic of data processing would depend on the number of records it's processing I'm switching products.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A virtual Data Set is the structure. Data are buffered through them; it is not necessary to fill it before draining any rows from it.
A Copy stage can be optimized out by the compiler.
I suspect the support analyst was trying to get a feel for whether your partitioning algorithm was distributing rows (keys) in such a way that the stream input key might be on one partition while the lookup key might be on a different partition, thereby rendering successful lookup impossible. If that were the case they would then recommend Entire partitioning on the reference input link, or identical partitioning and sorting by the lookup keys on both inputs.
A Copy stage can be optimized out by the compiler.
I suspect the support analyst was trying to get a feel for whether your partitioning algorithm was distributing rows (keys) in such a way that the stream input key might be on one partition while the lookup key might be on a different partition, thereby rendering successful lookup impossible. If that were the case they would then recommend Entire partitioning on the reference input link, or identical partitioning and sorting by the lookup keys on both inputs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.