RCP with sort/join stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Pravenai
Participant
Posts: 49
Joined: Mon Apr 06, 2009 1:32 am
Location: USA

RCP with sort/join stages

Post by Pravenai »

Hi,

I have a job which reads data from two sources and then I use sort stage on both links and finally I do a full outer join between these two links.

I need to automate this job which will accept source queries as parameter so that this job can be used for all the source queries that I have.

For this I used RCP. But sort and join stages need key column to be mentioned on which we do the sorting and joining. Since this key column will change for every query that I use, is there any way to achieve this operation in datastage?

Regards,
PR
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: RCP with sort/join stages

Post by kwwilliams »

Your sql can alias a column or a concatenation of columns as 'keycolumn' and then you can sort and partition based upon that column. It will also be helpful to have the key column extracted under its own column name so it can be used to insert into any target using drop unused coumn (which would drop 'keycolumn'
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Talking about RCP requires a basic understanding of the way DataStage works under the covers.
The Sort-Stage is basically just a more comfortable GUI for the OSH-operator tsort with improved visibility on the canvas provided by its Icon.
But the GUI forces you to select a key-column from a list of columns available on the input-stream.
The solution for RCP is to replace the GUI-stage by a generic-stage using the same tsort-operator. Within the generic stage all the options of the tsort-operator can be set to a specific value, unspecified options are left at their default values. All options can be fully parameterized.
This is not only true for the tsort-operator, but also for all the other operators, including of course the four join-operators (innerjoin, leftouterjoin, rightouterjoin, fullouterjoin).
The operators and their options are documented in the Parallel Job Advanced Developers Guide.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply