Rule of Thumb on Runtime Column Propagation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Rule of Thumb on Runtime Column Propagation

Post by vijayrc »

Hi
I am having hard-time grasping this RCP concept. Reading manual, and going thru the forum, only made my case worse. In a nutshell can anyone point out from the Best Practice Manual, when and where RCP is to be turned ON/OFF.. Sorry, for posting this question, but as I mentioned, manual, and the forum questions, I couldn't come to a conclusion.
Thanks in advance
Vijay
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think RCP is best defaulted to be turned OFF and only enabled when you explicitly need it. It is an incredibly useful and powerful feature with all the benefits and potential drawbacks associated with such functionality.

If you explicitly work with each column in your schema then having RCP enabled is no advantage at all
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Vijay, I am also a relative novice with RCP. I also have the same problem. From what I understand, RCP is needed (need to be ON) when you want a column moved across a stage without it doing anything with the column. You might need to do this when you want to use this column at a downstream stage in the same job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That is not correct. If you include that column in your design, directly mapped to output, then RCP has no effect.

Since RCP can invalidate lineage analysis its use should be discouraged.

RCP is, in my view, only for lazy developers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I have to disagree with you on this one, Ray. There are some great things that PX allows you to do with RCP, particularly when developing "generic" jobs with logic & inputs/outputs which can be used with different file formats.

But for the majority of job development work RCP causes issues and should be disabled.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

ray, is my assessment of RCP's need correct? What is its main use? Can you please explain in more detail?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

NO, because I refuse to use RCP, because I believe in strict management of metadata. If "they" could figure out a way that lineage analysis could identify the source of data generated by RCP, I might be persuaded to reconsider.

You are wrong to assert that RCP is needed to map a column, this is just as easily designed in, particularly with helpers such as Auto-Match Columns and Propagate Columns being available in the Designer tool.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

My understanding on the subject is: Assuming I have 50 columns and I have 10 stages in my DS job (with stage 1 as input file stage and stage 10 as output file and 2-8 are some intermediate stages) . Out of 50 I am using only 10 columns for some logic and rest 40 columns are straight move. In such situation I may choose not to map all 40 columns from stage 1 thru stage 10, as it would cause flow of same data from one memory location to another (for every stage).
Instead, I will enable RCP and have the column names in stage 1 and stage 10 and NO mention abt the columns in stage 2-8.
Shantanu Choudhary
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

i never used RCP i have read about it but can any one please let me know where exactly we can find this option in data stage .
Thanks
hi sam here
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

How would the data move from Stage 1 to Stage 10 then? That is, if you don't map the metadata in the intermediate stages?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's the "propagation" happening.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

yep, thatz the concept of RCP ..propagation..

words of caution...if you are using below stages and logic, i wd suggest not to use RCP..you may not get desired value, in ur o/p

1. Join stage (depend on type of join ur using)
2. remove duplicate
3. aggregator or aggragation logic in transformer stage
4. merge
5. pivot
these r just a list of most common stages u must be using, and RCP may not wrk with these stages in the job...
Shantanu Choudhary
bharatagnihotri
Participant
Posts: 7
Joined: Thu Sep 30, 2004 5:22 am

Re: Rule of Thumb on Runtime Column Propagation

Post by bharatagnihotri »

I do agree with Arnd on this one!! I am a frequent user of RCP and it works quiet smooth with the concept of generic jobs. Especially when we want to built something with varying Meta data! For example say we have a job, which combines or compares data between two inputs. But what meta data will be used say is decided at run time, then in this case RCP plays really cool and picks up the details from the descriptor file!

But yes I do agree of RCP being unpredictable and recommend it to be disabled by default and enable it on requirement basis rather than setting as default.
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

Hi All,

The great extend of RCP use will come in picture , when you are using shared containers. I think shared containers will help alot to reduce the coding efforts. I used RCP extensively, while developing Shared Containers. Its so help ful incase of that shred containers. In other cases, I used to switch off my RCP.

Regards
Nagesh
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Nagesh,

May I know in what case you use RPC for Shared Container function. Do you change metadata dynamically?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply