RCP for unused columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dougcl
Premium Member
Premium Member
Posts: 137
Joined: Thu Jun 24, 2010 4:28 pm

RCP for unused columns

Post by dougcl »

Hi folks, I like the idea of creating table defs from the source database, attaching them to the db input connector, but then using the SQL builder to only select a few columns. I want to build the jobs and downstream stages according to the full table def, not the reduced one, so that if we decide to add a column to the source select statement, then no rework has to happen downstream. Seems RCP has to be enabled to do this, but it is the opposite situation from most RCP situations. Instead of propagating columns that aren't in the table def, I want to have all the columns in the table def but only propagate a subset of them.

Does anyone do this?

Thanks,
Doug
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would not think so. RCP is an all-or-nothing deal.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Depends on your target stage. The DB2 Connector and ODBC Connector both have an option to drop unmatched fields. The Oracle Connector does not. Some of the earlier database stages have an option to silently drop unmatched fields.

You could very easily accomplish the same functionality you are seeking by placing a copy stage in front of your target stage. In the copy stage fully define the metadata for your target table and turn RCP off in this stage.

You do need to remember that although this makes development easier for some future modifications (not all, additional table relationships would not be any easier), it does carry a cost. There are errors that can occur that will reference columns not visible on the canvas. There is also additional memory and cpu utilized moving data that you don't need and may never need. Make sure you weigh all of the pros and cons before implementing something like this, not saying you shouldn't -- just think it through.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Excellent, I had read this last night and was trying to think how to articulate the points Keith made in his last two paragraphs, now I don't need to. :wink:

Since I have yet to use them, was also going to ask if the connectors were more 'Informatica-like' in that you can include all column metadata but only those that are linked / used will be included in the sql. Seems like that is (mostly) the case.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dougcl
Premium Member
Premium Member
Posts: 137
Joined: Thu Jun 24, 2010 4:28 pm

Post by dougcl »

kwwilliams wrote: You could very easily accomplish the same functionality you are seeking by placing a copy stage in front of your target stage.
Hi yes I thought of this after posting. So I envision having a big fat table def that is used to land a subset of the data (via column listing in the select statement) into a dataset. Then downstream stages using that dataset will have copy stages to select a subset (and possibly alias) the columns from there.

I'm still trying things, so this may not be the best either. For one thing, I noticed that a table def that is bigger (more columns) than the data saved in the dataset will lead to many warnings on the run:

Code: Select all

The modify operator keeps field "<column name>" which is not present in the output interface. 
I don't like designs that lead to warnings, but these appear to be safe to ignore.

Thanks for your input.
Doug
Post Reply