Migration & Version Control

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Migration & Version Control

Post by bmsq »

Hi guys,

I've started to look into our Dev, Test & Prod migration processes for DataStage jobs. To make this possible, this also requires us to put our DS projects unders version control.

What I would like to do is use the Managers export functionallity to package an entire project into a single .dsx file. This file will be put under version control by storing it in our CVS.

Our project standards mandate that all build artifacts within the CVS must be in a deployable stage (We are mostly a java shop, so by this I mean it compiles & executable). Because of this, I'm thinking that we will have projects for dev & stable in our development environment. The idea is that developers build jobs in dev and then migrate them to stable when they compile and can be run successfully. When developers are sure that thier updated jobs don't break existing jobs, the entire Stable project can be checked into CVS (either manually or nightly batch process).

There are still a few things about this process which I am not sure about:
  • Is there any easy for developers to move individual jobs from one project to another on the same server? I know this could be done using the Manager, but that seem tedious since the jobs are not moving between servers.

    Our test & prod environments wont have a compiler installed. Do the .dsx files contain executable files? Can we import just the executable files from the .dsx? Can we automate this import process using the command line on the server?
Any thoughts or explainations about your own processes would be grealy appreaciated.

Thanks in advance
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Use Version Control that comes with the DataStage installation CD.
Using DS Version Control, you can move one or more jobs between environments. The installation is easy and the tool is easy to learn from the provided documentation itself.
And ofcourse, you always have this forum to answer your queries. :wink:
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

Thanks for the response,

This maybe our first piece of work in using DataStage but our project is already well established and we need to adhere to project standards.

I would gladly choose to use the Version Control tools which come with DataStage but one of the project's standards is the use of CVS (shudders) for version control. So I'm afraid I don't have any choice in the matter.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You might be surprised at the level of choice you have. :wink:

My current client also has a 'standard' of CVS for versioning. When explained that there was no direct support for it in DataStage, it would be a PITA to force its use with the tool and that it had it's own Version Control software built in, we were 'allowed' to do our own thing.

Sure, an occasional full export may get shoved into CVS to keep them happy(er), but the day to day operations are handled through VC.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please note, however, that DataStage Version Control ceased to exist with version 8.0. So maybe taking exports of components and checking these into and out of CVS is a better long-term approach.

There's nothing in 8.0 or 8.1. There "may" be a mechanism in 8.2 that permits DataStage to interact with "proper" source code control systems such as ClearCase (which IBM owns). Will any other SCCSs be included? Who knows?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

We use CVS now and it works fine. We fill out migration forms which list all the jobs we want migrated to TEST from DEV. I scripted a solution which I think I posted recently. If you cannot find it then let me know. It is basically using most of DataStageBackup.bat with DsExport.exe to export one job at a time. I have a file called JobList.txt which has the list of jobs to export. The DataStageExport.bat exports each job one at a time using:

Code: Select all

%DSExportCmd% /H=%Host% /U=%User% /P=%Password% /JOB=%%i %Project% %%i.dsx >> %LogFileName%
Then I use a slightly different version to import called DataStageImport.bat. It looks for all the DSX files in a directory and imports all of them to TEST.

The directories look like:
c:\DataStage
c:\DataStage\KimD
c:\DataStage\KimD\Scripts
c:\DataStage\KimD\Backups
c:\DataStage\KimD\Backups\MyProject

So your DSX files are separated by project. Next I copy the DSX files from MyProject to c:\CVSROOT\MyProject and check them in with CVS Add Contents and CVS Commit.

Works great. Saved lots of time. All I need to do is compile the jobs. You may need to change some parameter values but I can automate that if we need it. Right now we don't need it because we have a DataStage wrapper script.
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Kim, you script can also export routines?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Good question! There was a recent post on DSExport.exe. The OP was interested to export only routines. But it turned out that routines cannot be exported using DSExport.exe.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Not routines alone. And not using DSExport.exe but using dscmdexport.exe for the project as a whole.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Routines have to be done manually. Shared container for some reason get exported along with the job.
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

kduke wrote:Routines have to be done manually. Shared container for some reason get exported along with the job.
Thats what I wanted to confirm. Was really hoping for a different answer but o well, limitations are present.
Thanks Kim.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Version Control has been dumped from DataStage 8 however the import/export functions have been improved and are in some ways better than the old VC functions. It is easier to build, add to, search, filter or view an export file under DataStage 8. So you could build release files under version 8 and append to them over time.

Kim's import/export approach should also work under version 8, the only import/export interface function dropped is dsjob "-import". The command line export should still be there.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Kim, can you explain little bit as to what you have in DataStageBackup.bat? Also, what do you have in the Datastage wrapper script? Thanks.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

First of all you should post a new thread instead of hijacking this one.

The backup batch file is on my tips page and you can download it for free. I think most people have a version of this now. Talk to my boss Sheila Wright and maybe she will let me post these versions. Only one was invented on my current job. It is called DataStageMigrate.bat. It does the above. You edit JobList.txt in a folder named after the project. You feed it the project user and password to export from and the same for the import. It clears this directory. Exports the jobs. Imports the jobs. Compiles the jobs. Logs all of this so you can check what fails to compile. All the dsx are named for each job. One per job. We copy all of these into a CVS folder. Add contents to CVS or Update and then Commit. We are done. Works great.

Other versions:
DataStageBackup.bat
DataStageCompile.bat
DataStageExport1PerJob.bat
DataStageExport.bat
DataStageImport.bat
DataStageMigrate.bat

Similar versions:
DSaveAsBmp.bat
DSaveAsBmpCategory.bat
DsJobReport.bat
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

DataStage wrapper is a shell script. It reads all parameters from text file in a directory below the project called JobParams. This script is a combination of 3 scripts developed be me and a couple other people. Best features of all three combined into one huge script. We wanted parameters to default to the job parameters unless otherwise set. We wanted the parameters to have a global list but over ride these for a specific job. We wanted to over ride these by instance id for mutiple instance jobs. It can solve all these problems and more.

Code: Select all

Order of processing 
1. JobName.InstanceId_#ParamName#.prm
2. JobName.InstanceId.prm
3. JobName_#ParamName#.prm
4. JobName.prm
5. GlobalParams_ParamName.prm
6. GlobalParams.prm
We had to add # to the file name because neither a job or a parameter can have a # in the name so _ was not a good enough separator.

Next you have to call dsjob to get a list of valid parameters. You cannot set a parameter that does not exist in the job or dsjob will not run.

We wanted the ability to kill this job. So the script traps kill and issues a dsjob -stop command.

We use Zena as a scheduler. It is similar to Control M and other schedulers. So when Zena tries to cancel the script then the script actually traps this kill signal and stops the job with dsjob -stop. Very cool.

I wrote all of this but many of these ideas came from other developers. Bill wanted to cancel the script. Peter had many ideas I used in setting parameters. Peter also wrote a script to automatically get all parameters and add them to the global parameter file. Setup is fast and easy now.

We wanted to display the log at the end of the job run. This is captured and displayed in Zena. We had jobs which had 2GB log files which choked Zena. So we limited the display to several hundred lines.

The script got so big that I split the display log into a separate script. It calls a combination of logsum and logdetail to get the best information. This script also had to trap the kill signals. Added a level of complexity I did not want but works now. The display log script is 500 lines long. Peter says half of that is comments because I am getting so old I have to comment the heck out things to remember what each section did.

The wrapper script is now over a 1,000 lines. Mostly comments according to Peter.

One thing we wanted was row counts after each job. So I modifed DSJobReportDb and renamed it EtlGetRowCounts. It now does PX jobs the way they should be. We now have STAGE_NAME as a a part of ETL_ROW_HIST. This allows us to find the associated reject link in reports. We add _rej_ to reject link names. We add _src_ and _tgt_ to link names for source and target counts. This makes reporting easier. Nice naming convention in place when I got to this customer. Lee and few other developers contributed to this script. I really appreciated all their ideas.
Mamu Kim
Post Reply