RCP for unused columns

dougcl · Post by **dougcl** » Fri Jul 09, 2010 3:31 pm

Hi folks, I like the idea of creating table defs from the source database, attaching them to the db input connector, but then using the SQL builder to only select a few columns. I want to build the jobs and downstream stages according to the full table def, not the reduced one, so that if we decide to add a column to the source select statement, then no rework has to happen downstream. Seems RCP has to be enabled to do this, but it is the opposite situation from most RCP situations. Instead of propagating columns that aren't in the table def, I want to have all the columns in the table def but only propagate a subset of them.

Does anyone do this?

Thanks,
Doug

ray.wurlod · Post by **ray.wurlod** » Fri Jul 09, 2010 6:13 pm

I would not think so. RCP is an all-or-nothing deal.

kwwilliams · Post by **kwwilliams** » Sat Jul 10, 2010 6:47 am

Depends on your target stage. The DB2 Connector and ODBC Connector both have an option to drop unmatched fields. The Oracle Connector does not. Some of the earlier database stages have an option to silently drop unmatched fields.

You could very easily accomplish the same functionality you are seeking by placing a copy stage in front of your target stage. In the copy stage fully define the metadata for your target table and turn RCP off in this stage.

You do need to remember that although this makes development easier for some future modifications (not all, additional table relationships would not be any easier), it does carry a cost. There are errors that can occur that will reference columns not visible on the canvas. There is also additional memory and cpu utilized moving data that you don't need and may never need. Make sure you weigh all of the pros and cons before implementing something like this, not saying you shouldn't -- just think it through.

chulett · Post by **chulett** » Sat Jul 10, 2010 7:00 am

Excellent, I had read this last night and was trying to think how to articulate the points Keith made in his last two paragraphs, now I don't need to.

Since I have yet to use them, was also going to ask if the connectors were more 'Informatica-like' in that you can include all column metadata but only those that are linked / used will be included in the sql. Seems like that is (mostly) the case.

dougcl · Post by **dougcl** » Mon Jul 12, 2010 10:56 am

kwwilliams wrote: You could very easily accomplish the same functionality you are seeking by placing a copy stage in front of your target stage.

Hi yes I thought of this after posting. So I envision having a big fat table def that is used to land a subset of the data (via column listing in the select statement) into a dataset. Then downstream stages using that dataset will have copy stages to select a subset (and possibly alias) the columns from there.

I'm still trying things, so this may not be the best either. For one thing, I noticed that a table def that is bigger (more columns) than the data saved in the dataset will lead to many warnings on the run:

Code: Select all

The modify operator keeps field "<column name>" which is not present in the output interface.

I don't like designs that lead to warnings, but these appear to be safe to ignore.

Thanks for your input.
Doug