DSXchange

Posted: **Wed Mar 17, 2010 8:35 am**

Hi,

Actually I want to know about RCP, so I am testing it.

I have 1000 columns in my source and I want to pass these 1000 columns into my target. I want transformations on only few columns. How does RCP affect my job. How can I improve my job?

What can I do. My design is

FF--------->copy------------>Shared Container---------->copy-------->FF

Inside the shared container I am doing simple transformation like filetering the records using contraint (filter is done on one column).

Thanks in advance

Posted: **Wed Mar 17, 2010 8:43 am**

RCP allows you to create jobs and only work with the fields that you wish to work with. You need only specify a column in the stage before you wish to use it in order to "surface" the column. This column will then be available for processing as per normal in any subsequent stage that your push it through.

For examply if you input is a dataset, then you need not specify any column values, merely the dataset name. In the next copy stage, surface the fields that are required as inputs to your shared container. Any fields that you wish to perform tasks on must be surfaced prior to reaching that stage.

Be wary though, lookups and column creations can create extra columns in your output that you can not see in your job design, and these will have to be handled appropriately.

Hope this gives you a good starting point

Posted: **Wed Mar 17, 2010 8:56 am**

RCP will not work with flat files, need a relational source.

Posted: **Wed Mar 17, 2010 8:59 am**

battaliou wrote:RCP will not work with flat files, need a relational source.

For the initial read of the file, yes, however you can turn on RCP after you have specified the format.

Or read in the record as one column and pass it over a schema and output as RCP.

Posted: **Thu Mar 18, 2010 2:39 am**

Yes, but whats the point of RCP if you have to define your meta data?

Posted: **Thu Mar 18, 2010 5:22 am**

You can define it at runtime via schemas.

Posted: **Thu Apr 01, 2010 8:46 am**

You can define it at runtime via schemas

Yes, and this is a very powerful technique; if you have a series of files as input that require similar handling - e.g. basic validation, load to a table or dataset etc, you can write a generic job with RCP, and specify the schema file name, table (dataset) and Modify stage specs as parameters. Then you have a single job, and if the input file metadata changes you just amend the schema files, no need for code change or redeployment....

The other main use I've found for RCP is to make shared containers as re-usable as possible by propagating columns thru the container transparently without specifying them, which also works well.

Otherwise we generally switch it off otherwise it can affect job maintainabiliy/readability if there are 'invisible' columns being propagated thru your design.....