Replace Db2 stage with Teradata stage

raj_cipher · Post by **raj_cipher** » Wed Oct 15, 2008 1:53 pm

We have a need to replace a DB2 stage in a Datastage job with a Teradata stage. As we have lot of jobs to convert. We thought of exporting the jobs as XML and replacing the stages with the help of some scripts. Is it possible for me to get some insights on how the XML is structured & how to go about this activity?

chulett · Post by **chulett** » Wed Oct 15, 2008 2:34 pm

Build the simplest DB2 job you can, export it. Replace DB2 with Teradata, export it again. Compare. See if the difference between them is something you want to tackle in any kind of automated way.

jdmiceli · Post by **jdmiceli** » Wed Oct 15, 2008 3:11 pm

Craig is just going to love this...

If you are Perl cognizant, I wrote a utility a while ago that will change parameters and the like within a .dsx file. That script is not limited to just parameters the way it is named. With little or no modification, you could use it in Batch (B) mode to change anything you wanted to in the file. I posted a bunch of Perl utilities a while ago at viewtopic.php?t=113429&highlight=perl+scripts. The script I am thinking of is called 'FixIndividualParameters.pl' (the last one on the posting). Follow the instructions in the header and let me know if you have questions.

Craig's suggestion is exactly correct to identify what you need to change. Then just look at the code and see if this utility can be modified to fit your need. If you do get it working, the only thing I ask is that you either add your changed script (so others can benefit from the new functionality) or send it to me so I can make it generic (in case you have to do something that makes it particular to your jobs).

Assuming what you want to replace is the first Stage in your job, my guess is that if you search the two .dsx files for the string ' Identifier = "V0" ' (without the single quotes - and that is V Zero, not the letter 'O'), you will find the beginning of the info you need to compare. You will need to look for any differences between the "BEGIN DSRECORD" and the "END DSRECORD" for the V0 record in both files and allow for those in your replacement script. I *think* that is where you will find things...

raj_cipher · Post by **raj_cipher** » Wed Oct 15, 2008 3:19 pm

Thanks John, will do the necessary changes and try the scripts out.

jdmiceli · Post by **jdmiceli** » Wed Oct 15, 2008 3:24 pm

Please note: I glossed over your initial question of how to do it in XML. Craig's answer appears to work for both. My answer might work for both, but was primarily intended for the .dsx file comparison and changing.

Your question actually just kind of spurs me to try and finish some smaller Perl projects and this is on the list (as part of a larger project). Maybe I can work on it soon - but no promises!

Feel free to move forward and maybe we'll have some interesting code comparisons to work with.

avenki77 · Post by **avenki77** » Wed Oct 15, 2008 4:05 pm

John,

Thanks for your reply.

I did take a sample DB2 and Teradata jobs and compare. But the xml is not that simple and straight forward.

What if the DB2 stage is not the first one in the Job. Then we can look for V0.

Also, There are many tags for which we need to know the meaning even before we can see if we can do a perl script to modify it.

For instance,

<Property Name="StageList">V0S11|V0S50</Property>

When I checked the code, the xml file contains the db2 stage not just at once. Its name is being referred in more than once in there.

So, I think we definitely need to understand the way the XML is structured. Is there any links available for this?

Thanks
Venky

chulett · Post by **chulett** » Wed Oct 15, 2008 4:32 pm

Using a "dsx" export rather than xml might be a little more... straight forward.

ray.wurlod · Post by **ray.wurlod** » Wed Oct 15, 2008 7:41 pm

You don't need Perl - any sensible programming language (including DataStage BASIC) can be used. Be aware that V0 is not the only possibility - for example if containers form part of the job design you will get V1, V2, and so on. And if there are shared containers an additional level of complexity is introduced. It's all do-able, but make sure all bases are covered if you develop an allegedly "one size fits all" utility.

jdmiceli · Post by **jdmiceli** » Thu Oct 16, 2008 10:05 am

Oh, ow! Now Ray is beating up my widdle Perl ego...

Remember guys, I use Perl so much because I am not bright enough to handle 'real' programming languages.

Anyhow, Ray's point is well taken. There can certainly be more than one V-number combination depending on the complexity of your jobs. As is his point that this may very well be done in any number of ways. I think at the level of the DSX file (NOT the XML file) level because that is how I taught myself to do it. I will give all my answers from the .DSX perspective.

To answer your question about the StageList property: within the V0 identifier definition block, the DSX file is using the StageList property to list the names of all the stages that use V0 as a root. If there were containers involved, then you might see a V1 with stages listed in this property under the V1 definition block using V1 as a root, or some other number if the job is very complex. If you look at another property (three lines later in a DSX file) called StageTypes, you will see a list of the types of stages used in this job, correlating with the identifiers listed in StageList. They will not be the names you gave to the stages in your job; they will be the internal stage type reference name (CTransformerStage, CSeqFileStage, CCustomStage, etc.). You should have a one to one relationship between the Identifiers listed in the StageList property and the StageTypes property. If you jot down the Identifiers in the StageList, then you can search for each of them to find the definition code block for each stage within the DSX file.

As Ray and Craig have pointed out, complexity adds... well, complexity. However, if you remember that all code blocks/stages are related to each other in some way and keep that in mind then you can actually find your way around pretty easily.

There is another property called Partner that usually is used to link stages that take data and have constraints applied to them, then output them to other stages (usually file creation, though not limited to that). Pay attention to how they show the relationships between the stages that are partners to the originating stage.

Hope this helps!

ray.wurlod · Post by **ray.wurlod** » Thu Oct 16, 2008 2:19 pm

John, that was never intended as a put-down, just an alternative. I like Perl but - purely because of my history - am way more proficient with DataStage BASIC. And some variant of C could as easily be used.

Good analysis of the partner property, too.

jdmiceli · Post by **jdmiceli** » Thu Oct 16, 2008 2:58 pm

Don't worry, Ray! No offense was taken.

My thing is that I have been employed full time in the database industry for the last 14 years and I've never even had a computer class. I learn by doing and Perl is easy enough to make sense. I tried learning C once - yeah that worked...

Anywho, thanks for the kind words on the analysis... Hopefully it helped Venky.

avenki77 · Post by **avenki77** » Thu Oct 16, 2008 3:56 pm

Thanks John, Ray and Craig.

Yes, the dsx file does look a little simple than the xml file. But the only difference is the representation the almost both of them contain the same amout of code and same number of lines. But that is fine, I took dsx now.

I think we need some more understanding on the structure of the dsx file, before we can decide how to modify it to change all the db2 stage to teradata stage.

For instance I took a simple job which has just a Teradata Connector stage and a link writing it on to a dataset.

For this, I saw :

BEGIN DSRECORD
Identifier "ROOT"
BEGIN DSRECORD
Identifier "V0"
BEGIN DSRECORD
Identifier "V0S50"
Name "target_dataset"
BEGIN DSRECORD
Identifier "V0S50P2"
Name "source_dataset"
BEGIN DSRECORD
Identifier "V4S0"
Name "source_dataset"
BEGIN DSRECORD
Identifier "V4S0P1"
Name "source_dataset"

these many records. I have to find a generic pattern to find all the DB2 stages and replace with teradata stages. So, I get many questions like: What do these records mean and how are they related to the job (that has just two stages and a link).

I do see some properties like StageType which has values of TeradataConnectorPX, PxDB2Ent, etc. But they dont just appear in one of these records, but many.

So, is there any guide to understand the structure better?

Thanks
Venky

ray.wurlod · Post by **ray.wurlod** » Thu Oct 16, 2008 4:10 pm

This post explains the DSRID which, although you aren't aware, is what you are asking about.

hamzaqk · Post by **hamzaqk** » Mon Oct 20, 2008 9:12 pm

i have done thing manually and its not that complicated. just make sample jobs and the only thing that will differ would be the TD stage from the DB2 one. what ever opotions you want in that target TD stage should be in your sample job. then replace that bit with the DB2 bit keeping in view the VS0 etc

worked for me.