Can a user-defined query looks up another user-defined query

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Can a user-defined query looks up another user-defined query

Post by Kwang »

In my job, I have two oledb stages (for data from different server and databases). Both of them out put data using user-defined query. Can 1 stage looks up the other one?
Kwang
gh_amitava
Participant
Posts: 75
Joined: Tue May 13, 2003 4:14 am
Location: California
Contact:

Post by gh_amitava »

Hi,

Better if you use a join stage between the outputs..

Regards
Amitava
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Post by Kwang »

The two tables are from different servers and there is no link between the two servers.
Kwang
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Then simply bring in all rows from one and do the lookup against the other. You can also bring all of one into a hash, then stream the other in and do a lookup against the hash.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Post by Kwang »

chulett wrote:Then simply bring in all rows from one and do the lookup against the other. You can also bring all of one into a hash, then stream the other in and do a lookup against the hash.
Do you mean a user-defined query can not directly look up another user-defined query?

BTW, I have already used a hashed file as alternative way. I'm just wodering or dreaming about one user-defined query can look up another user-defined query.
Kwang
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Kwang wrote:Do you mean a user-defined query can not directly look up another user-defined query?
Hard to say. I'm not at all sure what "directly look up" means to you. :?

You can have one "user-defined query" in your source stage bringing data into the job. You can then, in a Transformer, use a reference link to another database stage (like an oledb stage) and that reference lookup can use a "user-defined query" in it. You have to know how to write a user-defined query for a reference lookup, it's not all that tough - just a little different.

The way you are doing it as an "alternate" way sounds like the best way, to me. Kinda hard to say for sure without knowing the data and seeing the job. You could also think about having a job bring the data in from one database and use it to populate a work table in the second database. If necessary, use your other user-defined query to populate a second work table. A follow-up job could join these two data sets together directly in the database and then process from there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Hi,

User defined query can be used as a lookup and no doubt DataStage supports it. But if you are looking at performance perspective, my suggestion is to bring all data and use Merge/ Join/Lookup Stage to get the expected data.

Regards
Saravanan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

elavenil wrote:But if you are looking at performance perspective, my suggestion is to bring all data and use Merge/Join/Lookup Stage to get the expected data.
Keeping in mind, of course, that these are PX only stages. :wink:

The only one that is directly applicable to Server jobs is the "Lookup" stage but newbies need to remember that almost any passive stage can be used as a lookup and there is no dedicated Lookup stage. And the OP is already making use of a hashed lookup...

Ah - technicality alert. There is a Merge stage for Server jobs but it is for merging (or 'joining') two flat files, which is (as far as I know) completely different than the Merge stage in PX.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Post by Kwang »

O.K.
My first query(this is in the mainstream oledb stage):
link name: oleRawTableIn

SELECT DISTINCT postalcode
, convert(datetime, (left(birth_date, 4) + '-' + substring(birth_date, 5, 2) + '-' + right(birth_date, 2))) as 'birth_date'
, case ret_date when '19000001' then convert(datetime, '2999-12-31')
else convert(datetime, (left(ret_date, 4) + '-' + substring(ret_date, 5, 2) + '-' + right(ret_date, 2)))
end as 'ret_date'
FROM Rawtable
--number of rows retrived from this query: 70000

second query (the reference oledb stage):
Note: in this table, postalcd_wid is the primary key
Link Name: oletgtPostalCdIn
SELECT postal_cd
, postal_cd_birth_dt
, postal_cd_retd_dt
, rec_eff_dt
, rec_end_dt
, cur_rec_fl
, batch_nr
, postalcd_wid
, postal_cd_actv_fl
FROM tgtPostalCd
--number of rows retrived from this query: 5999

The link key between stream and reference is:
oletgtPostalCdIn.postal_cd = oleRawTableIn.postalcode
and
oletgtPostalCdIn.postal_cd_birth_dt = oleRawTableIn.birth_date

The output links are:
1. RecordsToBeInsertedOut
Transformer Constraint: oletgtPostalCdIn.NOTFOUND

2. RecordsToBeUpdatedOut
Transfomer Constraint: oleRawTableIn.ret_date <> olePostalCdIn.postal_cd_retd_dt

Result:
All output (70000) goes to RecordsToBeUpdatedOut, when look the data from the RecordsToBeUpdatedOut, only 1 (and is the first record from oletgtPostalCdIn) is there.

What is the problem?

On this weekend, I was thinking about this issue: Is it because the query for reference does not have a primary key?

Waiting for comments.

Kathy
Kwang
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

With or without a primary key, a query on a reference input must have a WHERE clause.
For example

Code: Select all

WHERE postalcd_wid = ?
However, a reference input is expected by DataStage to have at least one column marked as the key column (note, too, that this does not necessarily mean "primary key", but usually does).

The reference key expression in the Transformer stage delivers the value to be sought, for example, if coming from the stream input link, the reference key expression would contain

Code: Select all

oleRawTableIn.postalcode
(a reference to the postalcode column from that input link).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Post by Kwang »

Hi Ray,

Use your method, the job now can detect records that to be updated (find from the reference link but with non-key column value changed). But it still could not find the records to be inserted (key values can not be found from the reference link).

Kathy
Kwang
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That they can't be found on the reference link must be detected in your Transformer stage, probably in a constraint expression on an output link. You have two choices.
  • You can test the value of the link variable oletgtPostalCdIn.NOTFOUND

    You can test whether oletgtPostalCdIn.postalcd_wid is null (meaning that it was not found), using IsNull(oletgtPostalCdIn.postalcd_wid)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kwang
Participant
Posts: 20
Joined: Tue Nov 04, 2003 4:27 pm
Location: Canada

Post by Kwang »

ray.wurlod wrote:That they can't be found on the reference link must be detected in your Transformer stage, probably in a constraint expression on an output link. You have two choices.
  • You can test the value of the link variable oletgtPostalCdIn.NOTFOUND

    You can test whether oletgtPostalCdIn.postalcd_wid is null (meaning that it was not found), using IsNull(oletgtPostalCdIn.postalcd_wid)
It works!
WITH the constraint for out put link RecordsToBeInsertedOut: olePostalCdIn.NOTFOUND Or isnull(olePostalCdIn.postalcd_wid)

Thank you very much, Ray!
Kwang
Post Reply