Records flow one by one to Lookup table.
Moderators: chulett, rschirm, roy
Records flow one by one to Lookup table.
Hi,
Please find the question below.
Question: The look up is made b/w source file and the reference Oracletable on the key field(Field_key).
and when matched Equip_ID values from oracle table should be populated out to the transformer.
The question is, the records from srcfile should process one by one and should do a look up.
on 1st record from srcfile after lkp if values in the equip_id are more then 1 records then should be rejected and
should check the lookup for the 2nd record from the src file.
How to make the records from src file to flow one by one.
I think am able to explained the question good.
Please can any one has idea on this please answer for this question.
Regards,
Cheru.
Please find the question below.
Question: The look up is made b/w source file and the reference Oracletable on the key field(Field_key).
and when matched Equip_ID values from oracle table should be populated out to the transformer.
The question is, the records from srcfile should process one by one and should do a look up.
on 1st record from srcfile after lkp if values in the equip_id are more then 1 records then should be rejected and
should check the lookup for the 2nd record from the src file.
How to make the records from src file to flow one by one.
I think am able to explained the question good.
Please can any one has idea on this please answer for this question.
Regards,
Cheru.
Cheru
Hi,
Thanks for the reply..
I mean for the 1st record from src file after the lkp with Field_Key if Equip_ID values are more then one then reject those records.. and should do a look up for the 2nd record from srcfile and if again more then 1 record from lkp table then should reject those records if the lkp has only one record then that record should process furthur.
Thanks and regards,
Thanks for the reply..
I mean for the 1st record from src file after the lkp with Field_Key if Equip_ID values are more then one then reject those records.. and should do a look up for the 2nd record from srcfile and if again more then 1 record from lkp table then should reject those records if the lkp has only one record then that record should process furthur.
Thanks and regards,
Cheru
I'm afraid I cannot understand if that is the same or different from what I meant and don't know what you mean. Would it be possible for a simple example to show what you expect DataStage to do? A normal lookup will return only the first returned value if there are several matches, but you can also make it return multiple values - and then detect duplicates in a later stage. But all of this would just complicate matters if that is not what you are looking for.
If I understand there are a number of ways to achieve this.
If you want to reject records where it would return multiple rows, you could amend your Oracle lookup select statement to include a count of the key value attached to each record.
Something like "select key_value, count(*) over(partition by key_value) as key_count from table. (ie select records from your input table as well as a count of how many times that key value appears).
In a downstream transform you can then just look for the records which have returned a key_count value <> 1, and do with them what you please.
If you want to reject records where it would return multiple rows, you could amend your Oracle lookup select statement to include a count of the key value attached to each record.
Something like "select key_value, count(*) over(partition by key_value) as key_count from table. (ie select records from your input table as well as a count of how many times that key value appears).
In a downstream transform you can then just look for the records which have returned a key_count value <> 1, and do with them what you please.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: Records flow one by one to Lookup table.
I think you are describing a "sparse lookup" as against an in-memory lookup within DataStage.Cherukuri wrote:How to make the records from src file to flow one by one.
The default is for DataStage to execute a query on the database and retrieve all the data that it might need to do the lookup, and then to do the lookup in memory. If you set the query mode to "Sparse" in the database connector, then this changes and it executes the SQL query once per row. You specify the join key in the SQL query, for example "WHERE DataItem = ORCHESTRATE.DataItem", ORCHESTRATE.DataItem referring here to a column in the stream link into the Lookup stage.
Is that what you are after? I've done sparse lookups using DB2 but not with Oracle, so please someone point out if it's any different with the Oracle stage.
*Edit* I see now that you also want to detect duplicate rows. Hm. Not sure about that - I know that a regular lookup has a setting for whether a lookup can return multiple rows but I can't remember if that applies to sparse lookups. What I can remember is, it's really hard to find that option!
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.ShaneMuir wrote:Regarding returning multiple rows, I believe that a sparse lookup just returns as many rows for which there is a match automatically - you don't get the choice?
So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
AgreedPhilHibbs wrote:I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.
This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.PhilHibbs wrote:So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.
Last edited by ShaneMuir on Thu Oct 04, 2012 6:33 am, edited 1 time in total.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Can you get the sparse lookup query to automatically supply this count?ShaneMuir wrote:This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.
Another way to do this is to put in a Sort Stage, set the sort column(s) to the unique record key and set "Do not sort (already sorted)" since the multiple rows from the join will all be toether, and then in a Transformer do a set of key change detection stage variables:
Code: Select all
svOldKey=svKey
svKey=inLink.UniqueRecordKey
svKeyChanged=Compare(svKey,svOldKey)<>0
Code: Select all
svKeyChanged And LastRowInGroup(inLink.UniqueRecordKey)
(where UniqueRecordKey is the record key - if the key is multi-column then the svKey needs to be a concatenation of them all and the LastRowInGroup parameter has to be the last column in the multi-part record key), and an "otherwise" link that will get all the rows that have been duplicated. The principle here is: if this row is the first for a particular key, and it is also the last for that key, then it is the only record for that key. Otherwise it is not the only record with that key and duplicated have been created by the join.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant