Page 1 of 2

Rule of Thumb on Runtime Column Propagation

Posted: Tue Oct 24, 2006 7:31 am
by vijayrc
Hi
I am having hard-time grasping this RCP concept. Reading manual, and going thru the forum, only made my case worse. In a nutshell can anyone point out from the Best Practice Manual, when and where RCP is to be turned ON/OFF.. Sorry, for posting this question, but as I mentioned, manual, and the forum questions, I couldn't come to a conclusion.
Thanks in advance
Vijay

Posted: Tue Oct 24, 2006 7:44 am
by ArndW
I think RCP is best defaulted to be turned OFF and only enabled when you explicitly need it. It is an incredibly useful and powerful feature with all the benefits and potential drawbacks associated with such functionality.

If you explicitly work with each column in your schema then having RCP enabled is no advantage at all

Posted: Tue Oct 24, 2006 8:53 am
by splayer
Vijay, I am also a relative novice with RCP. I also have the same problem. From what I understand, RCP is needed (need to be ON) when you want a column moved across a stage without it doing anything with the column. You might need to do this when you want to use this column at a downstream stage in the same job.

Posted: Tue Oct 24, 2006 9:42 am
by ray.wurlod
That is not correct. If you include that column in your design, directly mapped to output, then RCP has no effect.

Since RCP can invalidate lineage analysis its use should be discouraged.

RCP is, in my view, only for lazy developers.

Posted: Tue Oct 24, 2006 9:46 am
by ArndW
I have to disagree with you on this one, Ray. There are some great things that PX allows you to do with RCP, particularly when developing "generic" jobs with logic & inputs/outputs which can be used with different file formats.

But for the majority of job development work RCP causes issues and should be disabled.

Posted: Tue Oct 24, 2006 10:04 am
by splayer
ray, is my assessment of RCP's need correct? What is its main use? Can you please explain in more detail?

Posted: Tue Oct 24, 2006 10:59 am
by ray.wurlod
NO, because I refuse to use RCP, because I believe in strict management of metadata. If "they" could figure out a way that lineage analysis could identify the source of data generated by RCP, I might be persuaded to reconsider.

You are wrong to assert that RCP is needed to map a column, this is just as easily designed in, particularly with helpers such as Auto-Match Columns and Propagate Columns being available in the Designer tool.

Posted: Tue Oct 24, 2006 1:28 pm
by talk2shaanc
My understanding on the subject is: Assuming I have 50 columns and I have 10 stages in my DS job (with stage 1 as input file stage and stage 10 as output file and 2-8 are some intermediate stages) . Out of 50 I am using only 10 columns for some logic and rest 40 columns are straight move. In such situation I may choose not to map all 40 columns from stage 1 thru stage 10, as it would cause flow of same data from one memory location to another (for every stage).
Instead, I will enable RCP and have the column names in stage 1 and stage 10 and NO mention abt the columns in stage 2-8.

Posted: Tue Oct 24, 2006 3:33 pm
by samsuf2002
i never used RCP i have read about it but can any one please let me know where exactly we can find this option in data stage .
Thanks

Posted: Tue Oct 24, 2006 3:37 pm
by splayer
How would the data move from Stage 1 to Stage 10 then? That is, if you don't map the metadata in the intermediate stages?

Posted: Tue Oct 24, 2006 3:52 pm
by ray.wurlod
That's the "propagation" happening.

Posted: Tue Oct 24, 2006 8:25 pm
by talk2shaanc
yep, thatz the concept of RCP ..propagation..

words of caution...if you are using below stages and logic, i wd suggest not to use RCP..you may not get desired value, in ur o/p

1. Join stage (depend on type of join ur using)
2. remove duplicate
3. aggregator or aggragation logic in transformer stage
4. merge
5. pivot
these r just a list of most common stages u must be using, and RCP may not wrk with these stages in the job...

Re: Rule of Thumb on Runtime Column Propagation

Posted: Wed Oct 25, 2006 12:41 am
by bharatagnihotri
I do agree with Arnd on this one!! I am a frequent user of RCP and it works quiet smooth with the concept of generic jobs. Especially when we want to built something with varying Meta data! For example say we have a job, which combines or compares data between two inputs. But what meta data will be used say is decided at run time, then in this case RCP plays really cool and picks up the details from the descriptor file!

But yes I do agree of RCP being unpredictable and recommend it to be disabled by default and enable it on requirement basis rather than setting as default.

Posted: Wed Oct 25, 2006 4:28 am
by Nageshsunkoji
Hi All,

The great extend of RCP use will come in picture , when you are using shared containers. I think shared containers will help alot to reduce the coding efforts. I used RCP extensively, while developing Shared Containers. Its so help ful incase of that shred containers. In other cases, I used to switch off my RCP.

Regards
Nagesh

Posted: Wed Oct 25, 2006 6:12 am
by kumar_s
Hi Nagesh,

May I know in what case you use RPC for Shared Container function. Do you change metadata dynamically?