Records flow one by one to Lookup table.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Records flow one by one to Lookup table.

Post by Cherukuri »

Hi,
Please find the question below.

Question: The look up is made b/w source file and the reference Oracletable on the key field(Field_key).
and when matched Equip_ID values from oracle table should be populated out to the transformer.
The question is, the records from srcfile should process one by one and should do a look up.
on 1st record from srcfile after lkp if values in the equip_id are more then 1 records then should be rejected and
should check the lookup for the 2nd record from the src file.

How to make the records from src file to flow one by one.

I think am able to explained the question good.
Please can any one has idea on this please answer for this question.

Regards,
Cheru.
Cheru
bhasds
Participant
Posts: 79
Joined: Thu May 27, 2010 1:49 am

Post by bhasds »

Hi Cherukuri,

If I have understood your question, then I think the lookup stage does exactly the same thing what you want in your job.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Do you mean that if you get 2 or more records from the Oracle lookup for a given "Field_Key" then you want your job to fail?
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Post by Cherukuri »

Hi,

Thanks for the reply..

I mean for the 1st record from src file after the lkp with Field_Key if Equip_ID values are more then one then reject those records.. and should do a look up for the 2nd record from srcfile and if again more then 1 record from lkp table then should reject those records if the lkp has only one record then that record should process furthur.


Thanks and regards,
Cheru
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I'm afraid I cannot understand if that is the same or different from what I meant and don't know what you mean. Would it be possible for a simple example to show what you expect DataStage to do? A normal lookup will return only the first returned value if there are several matches, but you can also make it return multiple values - and then detect duplicates in a later stage. But all of this would just complicate matters if that is not what you are looking for.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

If I understand there are a number of ways to achieve this.

If you want to reject records where it would return multiple rows, you could amend your Oracle lookup select statement to include a count of the key value attached to each record.
Something like "select key_value, count(*) over(partition by key_value) as key_count from table. (ie select records from your input table as well as a count of how many times that key value appears).

In a downstream transform you can then just look for the records which have returned a key_count value <> 1, and do with them what you please.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Maybe you could just return the row number and check whether that's 1 or higher (a simpler query)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Hi Ray, I actually had that thought originally - but then upon re-reading (re-interpreting) the requirement, I thought that all the records which have a multiple return are to be rejected. Hence the count rather than the row number.

Either or depending on requirement I guess.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: Records flow one by one to Lookup table.

Post by PhilHibbs »

Cherukuri wrote:How to make the records from src file to flow one by one.
I think you are describing a "sparse lookup" as against an in-memory lookup within DataStage.

The default is for DataStage to execute a query on the database and retrieve all the data that it might need to do the lookup, and then to do the lookup in memory. If you set the query mode to "Sparse" in the database connector, then this changes and it executes the SQL query once per row. You specify the join key in the SQL query, for example "WHERE DataItem = ORCHESTRATE.DataItem", ORCHESTRATE.DataItem referring here to a column in the stream link into the Lookup stage.

Is that what you are after? I've done sparse lookups using DB2 but not with Oracle, so please someone point out if it's any different with the Oracle stage.

*Edit* I see now that you also want to detect duplicate rows. Hm. Not sure about that - I know that a regular lookup has a setting for whether a lookup can return multiple rows but I can't remember if that applies to sparse lookups. What I can remember is, it's really hard to find that option!
Phil Hibbs | Capgemini
Technical Consultant
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Hi Phil,

As far as I have seen Oracle doesn't act any different.
Regarding returning multiple rows, I believe that a sparse lookup just returns as many rows for which there is a match automatically - you don't get the choice?
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ShaneMuir wrote:Regarding returning multiple rows, I believe that a sparse lookup just returns as many rows for which there is a match automatically - you don't get the choice?
I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.

So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.
Phil Hibbs | Capgemini
Technical Consultant
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

PhilHibbs wrote:I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.
Agreed
PhilHibbs wrote:So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.
This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.
Last edited by ShaneMuir on Thu Oct 04, 2012 6:33 am, edited 1 time in total.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ShaneMuir wrote:This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.
Can you get the sparse lookup query to automatically supply this count?

Another way to do this is to put in a Sort Stage, set the sort column(s) to the unique record key and set "Do not sort (already sorted)" since the multiple rows from the join will all be toether, and then in a Transformer do a set of key change detection stage variables:

Code: Select all

svOldKey=svKey
svKey=inLink.UniqueRecordKey
svKeyChanged=Compare(svKey,svOldKey)<>0
and the constraint on your output link for single records:

Code: Select all

svKeyChanged And LastRowInGroup(inLink.UniqueRecordKey)

(where UniqueRecordKey is the record key - if the key is multi-column then the svKey needs to be a concatenation of them all and the LastRowInGroup parameter has to be the last column in the multi-part record key), and an "otherwise" link that will get all the rows that have been duplicated. The principle here is: if this row is the first for a particular key, and it is also the last for that key, then it is the only record for that key. Otherwise it is not the only record with that key and duplicated have been created by the join.
Phil Hibbs | Capgemini
Technical Consultant
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

PhilHibbs wrote:Can you get the sparse lookup query to automatically supply this count?
You would need to write a custom SQL query and pass the approprate fields. Still a very basic query using either a group by or over(partition by ...) depending on the requirements.
Post Reply