Page 1 of 1

Way to implement queries btw two different source systems

Posted: Tue Jul 10, 2007 1:23 pm
by mystuff
Hi,

If I need to have a query between two different source systems (Oracle & say informix). Like

Code: Select all

         select      SourceSystemA.TableA.*   
         from        SourceSystemA.TableA 
         inner join SourceSystemB.TableB 
         on           SourceSystemA.TableA.FieldA = SourceSystemB.TableB.FieldB

Note : We are not selecting any fields from SourceSystemB. It is just for filtering records extracted from SourceSystemA


How can I do that ........

a) One way is to dump data (through datastage) from SourceSystem B to A, in a newly created table and use SQL accordingly (We are not allowed to create any table on the source, so this option is not there for us).
b) Extract all the data from SourceA to Sequential file and then filter out using hashed file (obtained through SourceB) -> We might end of pulling lot of data than required (I need to do calculations on that).
c) Extract values from SourceB and pass them as list of values as a parameter to SourceA, and change SQL as below

Code: Select all

         select      SourceSystemA.TableA.*   
         from        SourceSystemA.TableA 
         where SourceSystemA.TableA.FieldA in (#ParameterList#)
Could degrade the performance of extract

Are there any other ways you can suggest or which way should I use.

Re: Way to implement queries btw two different source system

Posted: Tue Jul 10, 2007 1:34 pm
by shrey3a
One of the method

Step 1--Job1 - Extract the values from source B in hash file (will remove the duplicate values)


Step 2 - Routine -- To read the hash file and return the value in ('value1','value2',.....) format

Step 3- Job 2 -- To extract the data from system A and use the return value of routine in Job Parameter.

Sequencer design

Job1 ---> Routine -----> Job2

Thanks

Posted: Wed Jul 11, 2007 6:51 am
by mystuff
Thats the third method, which I mentioned. Option (b) is out as well as it would result in pulling huge data.

Are there any other method anyone can think of. As of right now looks like I am stuck with just option (c)

Posted: Wed Jul 11, 2007 7:02 am
by chulett
I personally would go for 'a' and have done so many times. Push back on the restriction that you are "not allowed" to create tables on the source. You should be using a 'connection account' to access the source, a functional user created specifically for your ETL processes and not an existing owner account. With that, you should then be able to create 'work' or staging tables under your own user for situations like this.

When properly explained it really shouldn't be an issue getting the access you need.

Also, for an 'inner join' you could use a hashed file as a reference lookup for one of the sources, which is exactly what they were meant to be used for. Stream 'System B' into a hashed file and then a lookup while you process System A will get you your inner join. If there is a one-to-many relationship here, flip your sources around so that this approach works for you. Or worst case you could leverage a UV or direct database lookup and retrieve a 'multi-row result set'.

Posted: Wed Jul 11, 2007 8:13 am
by mystuff
I personally would go for 'a'
They have agreed to create a table for ETL :lol:

Posted: Wed Jul 11, 2007 8:22 am
by chulett
Well... there you go. 8)