Is this possible with Runtime column propagation?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 42
- Joined: Tue Oct 20, 2009 8:36 am
Is this possible with Runtime column propagation?
I have to create lots of jobs which basically just put data from stage to core tables. Only some technical fields need to be added (creation time, jobname) but no fields changed. We would need to do a change capture to check if data has changed and do an update or insert based on this (if nothing has changed, nothing should be done).
Is this possible with runtime column propagation or is using RCP a bad idea?
A problem I got into when trying to build a test job: How can I define which columns are key values? I need this for doing the Update in SQL. Or should I use custom sql here?
Is this possible with runtime column propagation or is using RCP a bad idea?
A problem I got into when trying to build a test job: How can I define which columns are key values? I need this for doing the Update in SQL. Or should I use custom sql here?
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Is this possible with Runtime column propagation?
You can do it, but it could get complex.
For the update you could create a wrapper to one of the datbase stages, and expose the key through a construct like a schema file. You could pass the where clause into the job as a parameter (I have seen negative impact on performance through use of this option).
For your CDC you would also have to use schema files to produce expose the key fields to run through a CDC and produce an output.
It can be done, the question to you is how adept your organization would be at supporting a complex job that would have no design time metadata on the job when issues arise. If you feel your staff is up to the task then it would be worth investigating.
For the update you could create a wrapper to one of the datbase stages, and expose the key through a construct like a schema file. You could pass the where clause into the job as a parameter (I have seen negative impact on performance through use of this option).
For your CDC you would also have to use schema files to produce expose the key fields to run through a CDC and produce an output.
It can be done, the question to you is how adept your organization would be at supporting a complex job that would have no design time metadata on the job when issues arise. If you feel your staff is up to the task then it would be worth investigating.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
This would only be a reusable job if all of the tables had the same key that needed to be exposed. I believe he is asking for a generic job that can be used regardless of the key value.kumar_s wrote:So define only those columns that you wish to assing as key and Values in the Change capture stage. And let the other columns to propagate. ...
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 42
- Joined: Tue Oct 20, 2009 8:36 am
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Expose the column with the column import stage with some generic name such as "keyfield". Then your CDC would use this column as the key and compare all values for change.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
As Keith has pointed out in at least 3 posts this is pretty straight forward. Simply use a column import for the key, use a column import to get the values to compare and you are done. Do not make these paramters to the job - not necessary.
The CDC stage does not accept a schema - the columns need to be present on the link before this stage.
Also, in the column import stage reset the property for "Keep Imported Column" to True this way you do not need to use a complimentary column export to put the column back on the link after the CDC, although you may need a modify later on to drop the generic columns or use the database stage (if that is your target) and set "Drop Unmatched" to True.
The CDC stage does not accept a schema - the columns need to be present on the link before this stage.
Also, in the column import stage reset the property for "Keep Imported Column" to True this way you do not need to use a complimentary column export to put the column back on the link after the CDC, although you may need a modify later on to drop the generic columns or use the database stage (if that is your target) and set "Drop Unmatched" to True.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Offhand topic:
If you are building a data warehouse, your architecture is odd. Usually warehouses are designed to track changes overtime to gain understanding of a particular dynamic within your organization. If you are overriding all of your data with updates, then you don't really have a basis with which to perform this type of analysis. You can ignore this if not applicable. If you guys are just starting out and you haven't taken tracking of changes overtime into your architecture you may want to ask yourselves some questions about what the business may need in the future.
If you are building a data warehouse, your architecture is odd. Usually warehouses are designed to track changes overtime to gain understanding of a particular dynamic within your organization. If you are overriding all of your data with updates, then you don't really have a basis with which to perform this type of analysis. You can ignore this if not applicable. If you guys are just starting out and you haven't taken tracking of changes overtime into your architecture you may want to ask yourselves some questions about what the business may need in the future.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
Re: Is this possible with Runtime column propagation?
I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?DSUser2000 wrote:Is this possible with runtime column propagation or is using RCP a bad idea?
I spent over two years maintaining and enhancing a data warehouse application. RCP was the bane of my existence every step of the way, particularly at 0200 when the support pod called and I needed to diagnose and fix the problem by 0600, with no metadata in production, my workstation (and DS software) 30 miles away and my remote connectivity dead or intermittent at best.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Be very careful when you make such statements to such a large audience with no support for your position. Many of the developers/architects/managers that rely on this site for information might take away something that is absolutely 100% incorrect. RCP, when used correctly, is a wonderful tool and IBM is building more and more infrastructure around it with each release.I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?
Please refrain from posting such diatribe about something that is core and fundamental in the product.
If you are not happy with the outcome of what you built and how to support it then I might suggest that you did not implement in the best possible manner.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
I take your point as constructive. I also respectfully point out that I did not nor do not represent myself as either speaking for IBM, as an expert whose every word must be carefully edited for an audience, or assertively criticizing DataStage in the general sense.mhester wrote:Be very careful when you make such statements to such a large audience with no support for your position. Many of the developers/architects/managers that rely on this site for information might take away something that is absolutely 100% incorrect. RCP, when used correctly, is a wonderful tool and IBM is building more and more infrastructure around it with each release.
Please refrain from posting such diatribe about something that is core and fundamental in the product.
If you are not happy with the outcome of what you built and how to support it then I might suggest that you did not implement in the best possible manner.
If the moderators of this forum have a different perspective, if my profile setting of "developer" is being contradicted by the content or tone of any of my posts including the one above, then they can edit or delete such posts as they see fit, and would need to explain the action to me privately or not at all as they prefer.
Be well.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
If you took the semi-sarcastic response out of your post, then your post would deserve some serious consideration. So in providing useful information I am going to ignore it and answer your question as though it were not present.I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?
RCP used correctly is a very powerful tool that will actually allow your organization to evolve over time without many of the headaches associated with design time information. Adding columns to a table is less intensive from a development side because the entire mapping does not need to be altered. This will allow your organization to be more agile in what you develop. I have seen some wonderful development using RCP.
On the other hand, if your organization is new to DataStage. Then RCP would be a tough concept to digest. Moving from any other ETL tool or hand-coding to DataStage is a difficult adjustment for many people. However, as the organization evolves using RCP should be reconsidered in scenarios where logic is repeated several times (such as with the original poster).
This is no different than utilizing another language, don't move to the advance topics until you have the basics covered. Its obvious you're not comfortable yet, so you might just want to stick to the basics.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 42
- Joined: Tue Oct 20, 2009 8:36 am
@kwwilliams:
Thanks a lot. I will try it with column export/import.
Regarding your concerns with overwriting these tables: Well, we have a pretty huge datawarehouse with historization (we've 4 dates for this: editing time and end, effectiveness date and end). However, the tables I'm speaking about in this thread are declarative tables which are manually maintained by the departments which deliver data to us. There's no need for historization in them. However, there are quite often changes (added or deleted columns) which would make lot of work for us if we always have to adjust the jobs.
We're also generally thinking about reducing the number of new jobs (and probably also change old ones) in order to improve maintenance (faster changes and more uniformity of jobs). We've already got more than 3000 jobs and 500 sequences now (most of them are quite complex and so RCP is not an option there) and we're thinking about consolidating some of the simple jobs using RCP because they stick to different development guidelines (different project teams over a longer time).
However, we don't have experience handling RCP and so we don't know if it's more a hazzle than an improvement for us...
Edit: I didn't want to create a dispute here but it's good to here some experience and get the different positions.
Thanks a lot. I will try it with column export/import.
Regarding your concerns with overwriting these tables: Well, we have a pretty huge datawarehouse with historization (we've 4 dates for this: editing time and end, effectiveness date and end). However, the tables I'm speaking about in this thread are declarative tables which are manually maintained by the departments which deliver data to us. There's no need for historization in them. However, there are quite often changes (added or deleted columns) which would make lot of work for us if we always have to adjust the jobs.
We're also generally thinking about reducing the number of new jobs (and probably also change old ones) in order to improve maintenance (faster changes and more uniformity of jobs). We've already got more than 3000 jobs and 500 sequences now (most of them are quite complex and so RCP is not an option there) and we're thinking about consolidating some of the simple jobs using RCP because they stick to different development guidelines (different project teams over a longer time).
However, we don't have experience handling RCP and so we don't know if it's more a hazzle than an improvement for us...
Edit: I didn't want to create a dispute here but it's good to here some experience and get the different positions.
Last edited by DSUser2000 on Thu Sep 16, 2010 1:35 pm, edited 2 times in total.
Franklin,
My response was, indeed, meant to be constructive and I am happy that you took it that way. I am a proponent of using the right tool for the job and if it means RCP then I am happy to do so and if not then so be it. I am comfortable with all of the stages on the palette including the restructure stages and have used them quite a bit with various customers and many of them are RCP only stages.
I agree with you that RCP jobs are more difficult to trouble shoot, but I am aware of plans within IBM to tighten this up and make it easier to debug. You can already see some of this with their latest 8.1 release and 8.5 will have even more integration.
There is no need for anyone to remove or edit a post - your original post is free speech and that is a wonderful thing and my response was simply to caution all of us collectively that RCP is not evil and can be a wonderful tool in our toolbox.
Thanks for the reply
My response was, indeed, meant to be constructive and I am happy that you took it that way. I am a proponent of using the right tool for the job and if it means RCP then I am happy to do so and if not then so be it. I am comfortable with all of the stages on the palette including the restructure stages and have used them quite a bit with various customers and many of them are RCP only stages.
I agree with you that RCP jobs are more difficult to trouble shoot, but I am aware of plans within IBM to tighten this up and make it easier to debug. You can already see some of this with their latest 8.1 release and 8.5 will have even more integration.
There is no need for anyone to remove or edit a post - your original post is free speech and that is a wonderful thing and my response was simply to caution all of us collectively that RCP is not evil and can be a wonderful tool in our toolbox.
Thanks for the reply
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Thanks, Mike. This is not the first time I've created misunderstanding by taking a conversational appoach by including references to my mood not necessarily clear in my writing, and not necessarily appropriate to the topic.mhester wrote:Franklin,
My response was, indeed, meant to be constructive and I am happy that you took it that way.
Keith, I apologize. My referring to your "offhand" post, in my thoughts if not in my writing, was out of admiration for your point and not to criticize it.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872