Page 1 of 1

Shared Container and RCP Issue...

Posted: Tue Jul 31, 2012 4:55 pm
by kaps
I am having trouble using RCP with shared container in a job. Error is :
Error when checking operator: Could not find input field
My job design is :

Code: Select all

SeqFile---->Shared Container----->Copy Stage----->SeqFile
Shared container's output has say 10 columns but it's input has 20 columns and the remaining 10 columns I am creating using Copy Stage and left the derivation as blank for those. I thought RCP will propogate those columns from the source but it does not. I have enabled RCP in all the stages in the job. What am I doing wrong here ?

Please advise.

Posted: Wed Aug 01, 2012 12:28 am
by ArndW
You need to create the columns in a stage such as the modify stage (preferred) or a transform stage; the copy stage won't do this for you, as you've discovered.

Posted: Wed Aug 01, 2012 2:43 am
by BI-RMA
By the way the subject of your post is highly distracting. Since you try to create the - empty - columns behind the shared container, how should RCP help to get those columns through it.

It is, however, possible to feed extra columns into a shared container and run them through it using RCP. For this to work You need to enable RCP on all stages within the shared container and within the job on the shared container itself.

Posted: Wed Aug 01, 2012 10:15 am
by kaps
I have replaced the copy stage with transformer but I still get the same error on those fields I expected to flow using RCP...RCP is turned on in all stages inclusing shared container. Derivation for those columns are empty in the transformer. Any input is appreciated...

Posted: Wed Aug 01, 2012 11:05 am
by ArndW
An empty derivation in the transformer tells DataStage that the columns are coming from the input. You need to give a column derivation in the transformer in order for the columns to be created.

Posted: Wed Aug 01, 2012 11:46 am
by kaps
I don't want to create these columns in transformer by specifying value in the derivation rather I want them to get the values using RCP from the shared container's output.

Basically I want to use the shared container but did not want whatever job uses that to have the same metadata. I understand that the Metadata pased to the shared container should be superset of the metadata in shared container and I expect the other columns to propogate using RCP to the final stage of the job.

Am I making sense ? Is my job design wrong ?

Thanks

Posted: Wed Aug 01, 2012 12:33 pm
by ArndW
I think that we are talking past each other. Even with RCP, your columns will need to be "created" somewhere before they can be part of the RCP schema. This can happen in several different ways, none of which seem to be happening here.

Add the following Job Parameters then compile and run your job.

$APT_DISABLE_COMBINATION true
$OSH_PRINT_SCHEMAS

The logfile will now contain an entry which shows the full schemas for each and every link in your job. You need to see where your columns are being put into the schema and where they suddenly disappear from the stream.

Posted: Wed Aug 01, 2012 4:45 pm
by Kryt0n
What the OP is saying (at least as how I first read it) is that the source file already has these columns and he wants them to appear in the end file despite not being used anywhere within the process (shared container or otherwise).

As such, you shouldn't need to define them in the copy stage/transform etc. With RCP on, they should exist in all stages (assuming RCP is on in all stages). If that isn't happening then the OSH_PRINT_SCHEMAS advice sounds a good starting point.

Posted: Wed Aug 01, 2012 11:43 pm
by BI-RMA
I am a bit puzzled concerning the use of RCP within your Job. You write here that you have turned RCP on for all stages. Your input, however, is a sequential file. To read all columns of a sequential file you either have to specify them explicitly using the column-list, or - when using RCP - within a schema-file, which can be parameterized to allow for different schemas in subsequent runs of the same job.

How do You specify the columns contained in your sequential file?

Posted: Thu Aug 02, 2012 12:24 pm
by kaps
To make things little clear on what I am trying to do :
What Kryton explained is what I am trying to do. I have all the columns in Seq file but I don't have some of those columns in shared container but I want them in my output sequential file along with the columns from shared container.

Reson I am doing this because :
I don't have to have the same meta data in the job where I call the shared container...I thought this is one of the main reason for RCP.

I missed to turn on RCP in one stage and after turning it on, It works now (I created those columns in the transformer and left the derivations empty for them). If this is not the correct approach, please let me know.

I tried like Kryton suggested, by just turning on the RCP and not creating columns in the transformer but my target does not get the values for the columns which supposed to come via RCP. When I look at the schemas in the log, it seems like those columns comes till the transformer before the target sequential file but not to the sequential file. I suppose this is correct as the columns are not defined in the sequential file. May be I need to use schema file here rahter than defining metadata in the target sequential file.

Does it make sense now ?

Posted: Thu Aug 02, 2012 4:52 pm
by Kryt0n
Just given it a test run and it would appear if you provide an input column list to the sequential file, that is all you get in the sequential file. Put a copy stage after the transformer stage, set Force=True, RCP on and have
no output mappings