Records flow one by one to Lookup table.

Cherukuri · Post by **Cherukuri** » Fri Sep 28, 2012 1:00 am

Hi,
Please find the question below.

Question: The look up is made b/w source file and the reference Oracletable on the key field(Field_key).
and when matched Equip_ID values from oracle table should be populated out to the transformer.
The question is, the records from srcfile should process one by one and should do a look up.
on 1st record from srcfile after lkp if values in the equip_id are more then 1 records then should be rejected and
should check the lookup for the 2nd record from the src file.

How to make the records from src file to flow one by one.

I think am able to explained the question good.
Please can any one has idea on this please answer for this question.

Regards,
Cheru.

bhasds · Post by **bhasds** » Fri Sep 28, 2012 1:27 am

Hi Cherukuri,

If I have understood your question, then I think the lookup stage does exactly the same thing what you want in your job.

ArndW · Post by **ArndW** » Fri Sep 28, 2012 2:00 am

Do you mean that if you get 2 or more records from the Oracle lookup for a given "Field_Key" then you want your job to fail?

Cherukuri · Post by **Cherukuri** » Fri Sep 28, 2012 2:54 am

Hi,

Thanks for the reply..

I mean for the 1st record from src file after the lkp with Field_Key if Equip_ID values are more then one then reject those records.. and should do a look up for the 2nd record from srcfile and if again more then 1 record from lkp table then should reject those records if the lkp has only one record then that record should process furthur.

Thanks and regards,

ArndW · Post by **ArndW** » Fri Sep 28, 2012 4:20 am

I'm afraid I cannot understand if that is the same or different from what I meant and don't know what you mean. Would it be possible for a simple example to show what you expect DataStage to do? A normal lookup will return only the first returned value if there are several matches, but you can also make it return multiple values - and then detect duplicates in a later stage. But all of this would just complicate matters if that is not what you are looking for.

ShaneMuir · Post by **ShaneMuir** » Fri Sep 28, 2012 7:27 am

If I understand there are a number of ways to achieve this.

If you want to reject records where it would return multiple rows, you could amend your Oracle lookup select statement to include a count of the key value attached to each record.
Something like "select key_value, count(*) over(partition by key_value) as key_count from table. (ie select records from your input table as well as a count of how many times that key value appears).

In a downstream transform you can then just look for the records which have returned a key_count value <> 1, and do with them what you please.

ray.wurlod · Post by **ray.wurlod** » Fri Sep 28, 2012 4:49 pm

Maybe you could just return the row number and check whether that's 1 or higher (a simpler query)?

ShaneMuir · Post by **ShaneMuir** » Wed Oct 03, 2012 4:10 am

Hi Ray, I actually had that thought originally - but then upon re-reading (re-interpreting) the requirement, I thought that all the records which have a multiple return are to be rejected. Hence the count rather than the row number.

Either or depending on requirement I guess.

PhilHibbs · Post by **PhilHibbs** » Wed Oct 03, 2012 10:18 am

Cherukuri wrote:How to make the records from src file to flow one by one.

I think you are describing a "sparse lookup" as against an in-memory lookup within DataStage.

The default is for DataStage to execute a query on the database and retrieve all the data that it might need to do the lookup, and then to do the lookup in memory. If you set the query mode to "Sparse" in the database connector, then this changes and it executes the SQL query once per row. You specify the join key in the SQL query, for example "WHERE DataItem = ORCHESTRATE.DataItem", ORCHESTRATE.DataItem referring here to a column in the stream link into the Lookup stage.

Is that what you are after? I've done sparse lookups using DB2 but not with Oracle, so please someone point out if it's any different with the Oracle stage.

*Edit* I see now that you also want to detect duplicate rows. Hm. Not sure about that - I know that a regular lookup has a setting for whether a lookup can return multiple rows but I can't remember if that applies to sparse lookups. What I can remember is, it's really hard to find that option!

ShaneMuir · Post by **ShaneMuir** » Thu Oct 04, 2012 2:13 am

Hi Phil,

As far as I have seen Oracle doesn't act any different.
Regarding returning multiple rows, I believe that a sparse lookup just returns as many rows for which there is a match automatically - you don't get the choice?

PhilHibbs · Post by **PhilHibbs** » Thu Oct 04, 2012 2:24 am

ShaneMuir wrote:Regarding returning multiple rows, I believe that a sparse lookup just returns as many rows for which there is a match automatically - you don't get the choice?

I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.

So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.

ShaneMuir · Post by **ShaneMuir** » Thu Oct 04, 2012 2:50 am

PhilHibbs wrote:I think that in either case - a sparse lookup or an in-memory lookup - you have to do the duplicate checking yourself. I don't think there is an automatic option to "reject if duplicates", you just get a warning in the log about duplicate rows being ignored.

Agreed

PhilHibbs wrote:So, to detect duplicates, you could do a fork-aggregate-join aggregating and joining either on an existing primary key or on a surrogate key if you don't have a unique key already e.g. @PARTITIONNUM:':':@INROWNUM or an automatic generated row number if you have a Sequential File source.

This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.

PhilHibbs · Post by **PhilHibbs** » Thu Oct 04, 2012 4:57 am

ShaneMuir wrote:This is where an extra column containing a count of how many records contain the key value from the sparse lookup query comes into play. Each row has the count and this can be easily checked in a downstream transform.

Can you get the sparse lookup query to automatically supply this count?

Another way to do this is to put in a Sort Stage, set the sort column(s) to the unique record key and set "Do not sort (already sorted)" since the multiple rows from the join will all be toether, and then in a Transformer do a set of key change detection stage variables:

Code: Select all

svOldKey=svKey
svKey=inLink.UniqueRecordKey
svKeyChanged=Compare(svKey,svOldKey)<>0

and the constraint on your output link for single records:

Code: Select all

svKeyChanged And LastRowInGroup(inLink.UniqueRecordKey)

(where UniqueRecordKey is the record key - if the key is multi-column then the svKey needs to be a concatenation of them all and the LastRowInGroup parameter has to be the last column in the multi-part record key), and an "otherwise" link that will get all the rows that have been duplicated. The principle here is: if this row is the first for a particular key, and it is also the last for that key, then it is the only record for that key. Otherwise it is not the only record with that key and duplicated have been created by the join.

ShaneMuir · Post by **ShaneMuir** » Thu Oct 04, 2012 6:39 am

PhilHibbs wrote:Can you get the sparse lookup query to automatically supply this count?

You would need to write a custom SQL query and pass the approprate fields. Still a very basic query using either a group by or over(partition by ...) depending on the requirements.

DSXchange

Records flow one by one to Lookup table.

Records flow one by one to Lookup table.

Re: Records flow one by one to Lookup table.