Page 1 of 1

Can a user-defined query looks up another user-defined query

Posted: Fri Jun 25, 2004 2:38 pm
by Kwang
In my job, I have two oledb stages (for data from different server and databases). Both of them out put data using user-defined query. Can 1 stage looks up the other one?

Posted: Fri Jun 25, 2004 3:10 pm
by gh_amitava
Hi,

Better if you use a join stage between the outputs..

Regards
Amitava

Posted: Fri Jun 25, 2004 3:18 pm
by Kwang
The two tables are from different servers and there is no link between the two servers.

Posted: Fri Jun 25, 2004 3:26 pm
by chulett
Then simply bring in all rows from one and do the lookup against the other. You can also bring all of one into a hash, then stream the other in and do a lookup against the hash.

Posted: Fri Jun 25, 2004 3:42 pm
by Kwang
chulett wrote:Then simply bring in all rows from one and do the lookup against the other. You can also bring all of one into a hash, then stream the other in and do a lookup against the hash.
Do you mean a user-defined query can not directly look up another user-defined query?

BTW, I have already used a hashed file as alternative way. I'm just wodering or dreaming about one user-defined query can look up another user-defined query.

Posted: Fri Jun 25, 2004 4:31 pm
by chulett
Kwang wrote:Do you mean a user-defined query can not directly look up another user-defined query?
Hard to say. I'm not at all sure what "directly look up" means to you. :?

You can have one "user-defined query" in your source stage bringing data into the job. You can then, in a Transformer, use a reference link to another database stage (like an oledb stage) and that reference lookup can use a "user-defined query" in it. You have to know how to write a user-defined query for a reference lookup, it's not all that tough - just a little different.

The way you are doing it as an "alternate" way sounds like the best way, to me. Kinda hard to say for sure without knowing the data and seeing the job. You could also think about having a job bring the data in from one database and use it to populate a work table in the second database. If necessary, use your other user-defined query to populate a second work table. A follow-up job could join these two data sets together directly in the database and then process from there.

Posted: Sat Jun 26, 2004 2:47 am
by elavenil
Hi,

User defined query can be used as a lookup and no doubt DataStage supports it. But if you are looking at performance perspective, my suggestion is to bring all data and use Merge/ Join/Lookup Stage to get the expected data.

Regards
Saravanan

Posted: Sat Jun 26, 2004 6:58 am
by chulett
elavenil wrote:But if you are looking at performance perspective, my suggestion is to bring all data and use Merge/Join/Lookup Stage to get the expected data.
Keeping in mind, of course, that these are PX only stages. :wink:

The only one that is directly applicable to Server jobs is the "Lookup" stage but newbies need to remember that almost any passive stage can be used as a lookup and there is no dedicated Lookup stage. And the OP is already making use of a hashed lookup...

Ah - technicality alert. There is a Merge stage for Server jobs but it is for merging (or 'joining') two flat files, which is (as far as I know) completely different than the Merge stage in PX.

Posted: Mon Jun 28, 2004 4:36 pm
by Kwang
O.K.
My first query(this is in the mainstream oledb stage):
link name: oleRawTableIn

SELECT DISTINCT postalcode
, convert(datetime, (left(birth_date, 4) + '-' + substring(birth_date, 5, 2) + '-' + right(birth_date, 2))) as 'birth_date'
, case ret_date when '19000001' then convert(datetime, '2999-12-31')
else convert(datetime, (left(ret_date, 4) + '-' + substring(ret_date, 5, 2) + '-' + right(ret_date, 2)))
end as 'ret_date'
FROM Rawtable
--number of rows retrived from this query: 70000

second query (the reference oledb stage):
Note: in this table, postalcd_wid is the primary key
Link Name: oletgtPostalCdIn
SELECT postal_cd
, postal_cd_birth_dt
, postal_cd_retd_dt
, rec_eff_dt
, rec_end_dt
, cur_rec_fl
, batch_nr
, postalcd_wid
, postal_cd_actv_fl
FROM tgtPostalCd
--number of rows retrived from this query: 5999

The link key between stream and reference is:
oletgtPostalCdIn.postal_cd = oleRawTableIn.postalcode
and
oletgtPostalCdIn.postal_cd_birth_dt = oleRawTableIn.birth_date

The output links are:
1. RecordsToBeInsertedOut
Transformer Constraint: oletgtPostalCdIn.NOTFOUND

2. RecordsToBeUpdatedOut
Transfomer Constraint: oleRawTableIn.ret_date <> olePostalCdIn.postal_cd_retd_dt

Result:
All output (70000) goes to RecordsToBeUpdatedOut, when look the data from the RecordsToBeUpdatedOut, only 1 (and is the first record from oletgtPostalCdIn) is there.

What is the problem?

On this weekend, I was thinking about this issue: Is it because the query for reference does not have a primary key?

Waiting for comments.

Kathy

Posted: Mon Jun 28, 2004 11:15 pm
by ray.wurlod
With or without a primary key, a query on a reference input must have a WHERE clause.
For example

Code: Select all

WHERE postalcd_wid = ?
However, a reference input is expected by DataStage to have at least one column marked as the key column (note, too, that this does not necessarily mean "primary key", but usually does).

The reference key expression in the Transformer stage delivers the value to be sought, for example, if coming from the stream input link, the reference key expression would contain

Code: Select all

oleRawTableIn.postalcode
(a reference to the postalcode column from that input link).

Posted: Tue Jun 29, 2004 10:21 am
by Kwang
Hi Ray,

Use your method, the job now can detect records that to be updated (find from the reference link but with non-key column value changed). But it still could not find the records to be inserted (key values can not be found from the reference link).

Kathy

Posted: Tue Jun 29, 2004 4:14 pm
by ray.wurlod
That they can't be found on the reference link must be detected in your Transformer stage, probably in a constraint expression on an output link. You have two choices.
  • You can test the value of the link variable oletgtPostalCdIn.NOTFOUND

    You can test whether oletgtPostalCdIn.postalcd_wid is null (meaning that it was not found), using IsNull(oletgtPostalCdIn.postalcd_wid)

Posted: Tue Jun 29, 2004 4:39 pm
by Kwang
ray.wurlod wrote:That they can't be found on the reference link must be detected in your Transformer stage, probably in a constraint expression on an output link. You have two choices.
  • You can test the value of the link variable oletgtPostalCdIn.NOTFOUND

    You can test whether oletgtPostalCdIn.postalcd_wid is null (meaning that it was not found), using IsNull(oletgtPostalCdIn.postalcd_wid)
It works!
WITH the constraint for out put link RecordsToBeInsertedOut: olePostalCdIn.NOTFOUND Or isnull(olePostalCdIn.postalcd_wid)

Thank you very much, Ray!