I created two perl scripts a while ago to do just this. It's great for source code control since you typically want to do change control at the job/routine level.
ParseDSX.pl will split a DSX file into separate DSX files, one per job/routine in the same folder structure that they appear in DataStage.
CatDSX goes the other way. It combines multiple DSX files into one or more equally distributed DSX files suitable for migration importing.
I was going to spend some time documenting these before posting it because the ParseDSX script does a few more things that you may or may not like.
A few things you need to know about ParseDSX:
I wanted the ability to be able to do comparisions between versions of checked in DSX files (jobs or routines) with freshly exported version from production projects. This ability is crucial in order to be able to audit your source code repository and confirm that your migration procedures are working and that "freelance" edits are not happening in production. There are data elements in the DSX that are generated at export time that ParseDSX will mask out with generic values such as export date time, job last edit date time etc. This allows you to do file compares without getting a lot of "false positive" changes.
Also, all parameter values are stripped out of jobs that do not begin with "PROTOTYPE" in the name. This is because these script were written to dovetail with Ken Blands job control.
I really need to do a more thorough job of documenting these and will at some point. But for now, I would suggest reviewing the code and making modifications where you like.
These were developed against DataStage 5.1 DSX export files.
Here they are:
ParseDSX.pl
Code: Select all
#!/usr/bin/perl
##############################################################################
#
# Program: ParseDSX.pl
#
# Description: See ShowBlurb function below for details
#
# === Modification History ===================================================
# Date Author Comments
# ---------- --------------- -------------------------------------------------
# 07-18-2002 Steve Boyce Created.
# 08-21-2002 Steve Boyce Exporting routines now includes binary info.
# 08-28-2002 Steve Boyce Corrected bug relating to jobs and routines
# located in the root folder. They now get
# created in the correct location.
# 08-28-2002 Steve Boyce Changed default output directory to be the name
# of the dsx file being parsed without the
# extension.
# -s option now works.
# 10-04-2002 Steve Boyce Eliminated -c option. That now the default
# and only behavior.
# Default Parmameter metadata is now stripped out
# of all jobs except any jobs that have PROTOTYPE
# or Batch:UTIL in the name.
# Routines are unaffected.
# 11-12-2002 Steve Boyce Added Version dipsplay option.
# Corrected source code generation bug.
# 02-27-2003 Steve Boyce Added -x option to strip out ValidationStatus
# 03-21-2003 Steve Boyce Stripping out ValidationStatus is now default
# behavior.
# 04-29-2003 Steve Boyce Bumped version number
#
##############################################################################
use Getopt::Std;
use File::Basename;
my $version="2.1.00";
##############################################################################
sub ShowBlurb
{
print <<ENDOFBLURB;
Syntax: ParseDSX.pl -h -l<ListFile> -o<OutputDir> -s -v -y <DSXFile>
Version: $version
Description: Extracts individual jobs and routines from a DataStage export
file.
Parameters: <DSXFile> Name of DataStage DSX file to parse. This file is
assumed to be generated from the DataStage export
process.
Options: -l job/routine list file. (future enhancement)
This file contains a list of jobs and routines to extract
from the <DSXFile>.
-o Explicitly specify <OutputDir> directory.
Default is the name of the parsed dsx file without the
extension in the current directory.
-s Extract job "Job Control Code" and routine "source"
code into "source files".
<job>.src and <routine>.src
These will appear in the same directory as the generated
dsx files in the <OutputDir> directory.
-v Display version information.
-y Force a "Yes" answer to overwrite existing <OutputDir>
directory prompt.
-h This help.
Notes: Job and routine names are case sensitive in DataStage. Extracted
jobs and routines are placed in file names constructed based on
job or routine names. Running this utility on the Windows
platform will ignore case and possibly consider some jobs and
routines duplicates when the UNIX platform will not.
It is a good practice to not rely on case as a differentiator
for file names.
ENDOFBLURB
}
##############################################################################
sub ShowVersion
{
print <<ENDOFBLURB;
ParseDSX.pl Version $version
ENDOFBLURB
}
##############################################################################
sub DieWith
{
my ($MessageLine) = @_;
print "$MessageLine\nType ParseDSX.pl -h for help.\n";
exit 1;
}
##############################################################################
sub OKToOverWriteOutputDir
{
my ($OutPutDirectory, $opt_y) = @_;
my $RetVal = 0;
if ( -e $OutPutDirectory ) {
if ( $opt_y ) {
print "*** Warning: <OutputDir> directory ($OutPutDirectory) already exists. Using anyway.\n";
$RetVal = 1;
}
else {
print "*** Warning: <OutputDir> directory ($OutPutDirectory) already exists.\n";
print "Proceed anyway? [y|n] ";
$Ans = <STDIN>;
chomp($Ans) if ($Ans);
if ( "$Ans" eq "Y" || "$Ans" eq "y" ) {
$RetVal = 1;
}
else {
DieWith("Aborting.");
}
}
}
else {
if ( MakeDir($OutPutDirectory, 777) ) {
$RetVal = 1;
}
else {
DieWith("Error: Could not create ($OutPutDirectory) directory");
}
}
return $RetVal;
}
##############################################################################
sub LoadObjectList
{
my ($DSXListFile) = @_;
my %DSXObjectList = ();
if ( $DSXListFile ) {
if (open fhDSXListFile, "<".$DSXListFile) {
while (<fhDSXListFile>) {
chop;
#-- Push line onto array
$DSXObjectList{$_} = 1;
}
close fhDSXListFile;
}
else {
DieWith("Error: Can't open $DSXListFile");
}
while ( ($key,$value) = each %DSXObjectList ) {
print "$key=$value\n";
}
}
return %DSXObjectList;
}
##############################################################################
sub MakeDir
{
my ($FullDirPath, $Mode) = @_;
my @DirList = ();
my $PartialDirPath = "";
my $RetVal = 1;
$FullDirPath =~ tr/\\/\//;
@DirList = split(/\//,$FullDirPath);
foreach $Directory ( @DirList ) {
$PartialDirPath = $PartialDirPath . $Directory. "/" ;
if ( ! (length($PartialDirPath) == 3 && substr($PartialDirPath, 1, 2) eq ":/") ) {
if ( ! -e $PartialDirPath ) {
if ( ! mkdir($PartialDirPath, $Mode) ) {
$RetVal = 0;
}
}
}
}
return $RetVal;
}
##############################################################################
sub ParseQuotedString
{
my ($InputLine) = @_;
my $FirstQuotePos = 0;
my $SecondQuotePos = 0;
my $Length = 0;
$FirstQuotePos = index($InputLine, '"');
$SecondQuotePos = index($InputLine, '"', $FirstQuotePos+1);
$Length = $SecondQuotePos - $FirstQuotePos;
return substr($InputLine, $FirstQuotePos + 1, $Length - 1);
}
##############################################################################
sub MakeDuplicateName
{
my ($OriginalName) = @_;
my $NewName = "";
my $DupSuffix = 1;
$NewName = $OriginalName . "_dup" . "$DupSuffix";
while ( -e $NewName ) {
if ( $DupSuffix > 99 ) {
DieWith("Error: There seems to be more than 99 duplicate jobs or routines.\n");
}
$DupSuffix += 1;
$NewName = $OriginalName . "_dup" . "$DupSuffix";
}
return $NewName;
}
##############################################################################
sub WriteDSXHeader
{
my ($fhOutputFile) = @_;
print $fhOutputFile "BEGIN HEADER\n";
print $fhOutputFile " CharacterSet \"ENGLISH\"\n";
print $fhOutputFile " ExportingTool \"Ardent DataStage Export\"\n";
print $fhOutputFile " ToolVersion \"3\"\n";
print $fhOutputFile " ServerName \"$cStandardServerName\"\n";
print $fhOutputFile " ToolInstanceID \"$cStandardToolInstanceID\"\n";
print $fhOutputFile " MDISVersion \"1.0\"\n";
print $fhOutputFile " Date \"$cStandardDate\"\n";
print $fhOutputFile " Time \"$cStandardTime\"\n";
print $fhOutputFile "END HEADER\n";
}
##############################################################################
sub WriteDSXObjectFile
{
my ($ObjectType, $tmpDSXObjectHolder, $OutPutDirectory, $DSXObjectName, $DSXCategoryName) = @_;
my $x = 0;
my $TranslatedDSXObjectName = "";
my $TranslatedCategorytName = "";
my $OutputFileName = "";
my $OutputLine = "";
my $WriteLine = 1;
$TranslatedDSXObjectName = $DSXObjectName;
$TranslatedDSXObjectName =~ tr/:/_/;
$TranslatedDSXObjectName =~ tr/ /_/;
if ($ObjectType eq "JOB") {
$OutPutDirectory = $OutPutDirectory . "/jobs";
if ( ! -e $OutPutDirectory ) {
if ( ! MakeDir($OutPutDirectory, 777) ) {
DieWith("Error: Could not create directory: $OutPutDirectory");
}
}
}
else {
$OutPutDirectory = $OutPutDirectory . "/routines";
if ( ! -e $OutPutDirectory ) {
if ( ! MakeDir($OutPutDirectory, 777) ) {
DieWith("Error: Could not create directory: $OutPutDirectory");
}
}
}
if ($DSXCategoryName) {
$TranslatedCategoryName = $DSXCategoryName;
$TranslatedCategoryName =~ tr/ /_/;
$TranslatedCategoryName =~ tr/\\/\//s;
$OutPutDirectory = $OutPutDirectory . "/" . $TranslatedCategoryName;
if ( ! -e $OutPutDirectory ) {
if ( ! MakeDir($OutPutDirectory, 777) ) {
DieWith("Error: Could not create directory: $OutPutDirectory");
}
}
}
$OutputFileName = $OutPutDirectory . "/" . $TranslatedDSXObjectName . ".dsx";
print "Writing File: $OutputFileName...\n";
if ( -e $OutputFileName ) {
print "*** WARNING: Job/Routine output DSX file ($OutputFileName) already exists. Creating duplicate.\n";
$OutputFileName = MakeDuplicateName($OutputFileName);
}
if (open (fhOutputFile, ">$OutputFileName")) {
WriteDSXHeader(\*fhOutputFile);
if ($ObjectType eq "ROUTINE") {
print fhOutputFile "BEGIN DSROUTINES\n";
}
while ( $$tmpDSXObjectHolder[$x] ) {
$OutputLine = $$tmpDSXObjectHolder[$x];
$WriteLine = 1;
#-- Filter ValidationStatus metadata out
#-- This metadata seems to be intermitent with no value added.
if ($OutputLine =~ /^.*ValidationStatus /) {
$WriteLine = 0;
}
if ($WriteLine) {
#-- Normalize dates and times
if ($OutputLine =~ /^ {3,6}DateModified /) {
$OutputLine =~ s/\".{10}\"/\"$cStandardDate\"/;
}
else {
if ($OutputLine =~ /^ {3,6}TimeModified /) {
$OutputLine =~ s/\".{8}\"/\"$cStandardTime\"/;
}
}
#-- Send the line to the output file
print fhOutputFile "$OutputLine";
}
$x = $x + 1;
}
if ($ObjectType eq "ROUTINE") {
print fhOutputFile "END DSROUTINES\n";
}
close fhOutputFile;
}
}
##############################################################################
sub WriteDSXSourceFile
{
my ($ObjectType, $tmpDSXObjectSourceHolder, $OutPutDirectory, $DSXObjectName, $DSXCategoryName) = @_;
my $x = 0;
my $TranslatedDSXSourceName = "";
my $TranslatedCategorytName = "";
my $OutputFileName = "";
my $OutputLine = "";
$TranslatedDSXSourceName = $DSXObjectName;
$TranslatedDSXSourceName =~ tr/:/_/;
$TranslatedDSXSourceName =~ tr/ /_/;
if ($ObjectType eq "JOB") {
$OutPutDirectory = $OutPutDirectory . "/jobs";
$SourceKeyword = "JobControlCode";
}
else {
$OutPutDirectory = $OutPutDirectory . "/routines";
$SourceKeyword = "Source";
}
if ($DSXCategoryName) {
$TranslatedCategoryName = $DSXCategoryName;
$TranslatedCategoryName =~ tr/ /_/;
$TranslatedCategoryName =~ tr/\\/\//s;
$OutPutDirectory = $OutPutDirectory . "/" . $TranslatedCategoryName;
}
$OutputFileName = $OutPutDirectory . "/" . $TranslatedDSXSourceName . ".src";
if ( -e $OutputFileName ) {
$OutputFileName = MakeDuplicateName($OutputFileName);
}
#-- Convert single line encoded source code to properly formated code
#-- Chop off trailing 6 spaces after every CR-LF (really leading 6 spaces)
#-- Convert "symbolic CR-LF to real CR-LF
$tmpDSXObjectSourceHolder =~ s/\\\(D\)\\\(A\)/\n/g;
#-- Chop off leading keyword - either Source " or JobControlCode "
$tmpDSXObjectSourceHolder =~ s/^ *$SourceKeyword "//;
#-- Chop off trailing quote
$tmpDSXObjectSourceHolder =~ s/\" *$//;
#-- Replace all \" with "
$tmpDSXObjectSourceHolder =~ s/\\\"/\"/g;
#-- replace all \\ with \
$tmpDSXObjectSourceHolder =~ s/\\\\/\\/g;
if (open (fhOutputFile, ">$OutputFileName")) {
print fhOutputFile $tmpDSXObjectSourceHolder;
close fhOutputFile;
}
}
##############################################################################
sub OKToStripDefaultValue
{
my ($DSXObjectName, $DSParameterName) = @_;
my $RetVal = 0;
if (! ($DSXObjectName =~ /Batch::UTIL/) ) {
if (! ($DSXObjectName =~ /PROTOTYPE/) ) {
if (! ($DSParameterName eq "JobName" or $DSParameterName eq "PartitionNumber" or $DSParameterName eq "PartitionCount" ) ) {
$RetVal = 1;
}
}
}
return $RetVal
}
##############################################################################
sub ParseDSXObjects
{
my ($DSXFileName, $Greppize, $OutPutDirectory, $DSXObjectList) = @_;
my @tmpDSXObjectHolder = ();
my $tmpDSXObjectSourceHolder = "";
my $DSXObjectName = "";
my $DSXCategoryName = "";
my $InDSJobBlock = 0;;
my $InDSRoutineBlock = 0;
my $InDSRecordBlock = 0;
my $InDSSubRecordBlock = 0;
my $InDSUBinaryBlock = 0;
my $DSParameterName = "";
if (open fhDSXFileName, "<".$DSXFileName) {
while (<fhDSXFileName>) {
if ($InDSJobBlock) {
push(@tmpDSXObjectHolder, $_);
if ($_ =~ /^END DSJOB/) {
$InDSJobBlock = 0;
WriteDSXObjectFile("JOB", \@tmpDSXObjectHolder, $OutPutDirectory,
$DSXObjectName, $DSXCategoryName);
if ( $tmpDSXObjectSourceHolder ) {
WriteDSXSourceFile("JOB", $tmpDSXObjectSourceHolder, $OutPutDirectory,
$DSXObjectName, $DSXCategoryName);
}
}
else {
if ($InDSRecordBlock) {
if ($_ =~ /^ END DSRECORD/) {
$InDSRecordBlock = 0;
}
else {
if ($InDSSubRecordBlock) {
if ($_ =~ /^ END DSSUBRECORD/) {
$InDSSubRecordBlock = 0;
}
else {
if ($_ =~ /^ Name/) {
$DSParameterName = ParseQuotedString($_);
}
if ($_ =~ /^ Default/) {
if (OKToStripDefaultValue($DSXObjectName, $DSParameterName)) {
pop(@tmpDSXObjectHolder);
}
}
}
}
else {
if ($_ =~ /^ BEGIN DSSUBRECORD/) {
$InDSSubRecordBlock = 1;
}
else {
if ($_ =~ /^ Category /) {
$DSXCategoryName = ParseQuotedString($_);
}
if ($Greppize) {
if ($_ =~ /^ JobControlCode /) {
$tmpDSXObjectSourceHolder = $_;
}
}
}
}
}
}
else {
if ($_ =~ /^ BEGIN DSRECORD/) {
$InDSRecordBlock = 1;
}
else {
if ($_ =~ /^ Identifier /) {
$DSXObjectName = ParseQuotedString($_);
}
}
}
}
}
else {
if ($InDSRoutineBlock) {
if ($_ =~ /^END DSROUTINES/) {
$InDSRoutineBlock = 0;
}
else {
if ($InDSRecordBlock) {
push(@tmpDSXObjectHolder, $_);
if ($_ =~ /^ END DSRECORD/) {
$InDSRecordBlock = 0;
}
else {
if ($_ =~ /^ Identifier /) {
$DSXObjectName = ParseQuotedString($_);
}
else {
if ($_ =~ /^ Category /) {
$DSXCategoryName = ParseQuotedString($_);
}
if ($Greppize) {
if ($_ =~ /^ Source /) {
$tmpDSXObjectSourceHolder = $_;
}
}
}
}
}
else {
if ($InDSUBinaryBlock) {
push(@tmpDSXObjectHolder, $_);
if ($_ =~ /^ END DSUBINARY/) {
$InDSUBinaryBlock = 0;
WriteDSXObjectFile("ROUTINE", \@tmpDSXObjectHolder, $OutPutDirectory,
$DSXObjectName, $DSXCategoryName);
if ( $tmpDSXObjectSourceHolder ) {
WriteDSXSourceFile("ROUTINE", $tmpDSXObjectSourceHolder, $OutPutDirectory,
$DSXObjectName, $DSXCategoryName);
}
}
else {
if ($_ =~ /^ COMMENT Record is empty/) {
print "*** WARNING: Routine ($DSXObjectName) is missing compiled executable.\n";
}
}
}
else {
if ($_ =~ /^ BEGIN DSRECORD/) {
$InDSRecordBlock = 1;
@tmpDSXObjectHolder = ();
push(@tmpDSXObjectHolder, $_);
$tmpDSXObjectSourceHolder = "";
$DSXCategoryName = "";
}
else {
if ($_ =~ /^ BEGIN DSUBINARY/) {
$InDSUBinaryBlock = 1;
push(@tmpDSXObjectHolder, $_);
}
}
}
}
}
}
else {
if ($_ =~ /^BEGIN DSJOB/) {
$InDSJobBlock = 1;
@tmpDSXObjectHolder = ();
push(@tmpDSXObjectHolder, $_);
$tmpDSXObjectSourceHolder = "";
$DSXCategoryName = "";
}
else {
if ($_ =~ /^BEGIN DSROUTINES/) {
$InDSRoutineBlock = 1;
}
}
}
}
}
close (fhDSXFileName);
}
}
##############################################################################
# Main
#-- Global variables (constants)
$cStandardDate = "2001-01-01";
$cStandardTime = "01.00.00";
$cStandardServerName = "ServerName";
$cStandardToolInstanceID = "ToolInstanceID";
#-- Local variables
my %DSXObjectList = ();
my $NumArgs = 0;
my $DSXFileName = "";
my $OutPutDirectory = "";
my $Ans = "";
if (getopts('hl:o:svy')) {
if ( $opt_h ) {
ShowBlurb();
exit 2;
}
if ( $opt_v ) {
ShowVersion();
exit 2;
}
$NumArgs = scalar(@ARGV);
if ( $NumArgs == 1 ) {
$DSXFileName = $ARGV[0];
if ( -r $DSXFileName ) {
if ( $opt_o ) {
$OutPutDirectory = $opt_o;
}
else {
$OutPutDirectory = basename($DSXFileName, ".dsx");
}
if ( OKToOverWriteOutputDir($OutPutDirectory, $opt_y) ) {
%DSXObjectList = LoadObjectList($opt_l);
ParseDSXObjects($DSXFileName, $opt_s, $OutPutDirectory, \@DSXObjectList);
}
}
else {
DieWith("Error: Unable to read file ($DSXFileName).");
}
}
else {
DieWith("Error: Invalid filespec.");
}
}
else {
DieWith("Error: Invalid options.");
}
CatDSX.pl
Code: Select all
#!/usr/bin/perl
##############################################################################
#
# Program: CatDSX.pl
#
# Description: See ShowBlurb function below for details
#
# Notes: @gDSXFileList() array format
# Column0 - Fully Qualified source DSX file name
# Column1 - Directory Name portion of Column0
# Column2 - File Name portion of Column0
# Column3 - Size in bytes of file pointed to by Column0
# Column4 - Target CombinedDSXFile Number
# Column5 - Job type (J-Job|R-Routine)
#
# This script will create one or more target DSX files that
# are constructed in the same manner that DataStage would have
# created them.
#
# === Modification History ===================================================
# Date Author Comments
# ---------- --------------- -------------------------------------------------
# 03-27-2003 Steve Boyce Created.
#
##############################################################################
use Cwd;
use Getopt::Std;
use File::Basename;
use File::Find;
##############################################################################
sub ShowBlurb
{
print <<ENDOFBLURB;
Syntax: CatDSX.pl -r -s<n> -y -h <CombinedDSXFile> [SourceDSXDir]
Description: Combines individual DSX files in SourceDSXDir into one (or more)
DSX file(s) suitable for importing into DataStage.
Parameters: <CombinedDSXFile> Name of DataStage DSX file(s) to create.
[SourceDSXDir] Path where individual DSXFiles reside.
Optional. Defaults to current directory.
Options: -r Recurse subdirectories
-s<n> Number of CombinedDSXFiles to create (evenly spreads
individual DSXFiles across all CombinedDSXFiles by size).
Must be greater than 0.
Must be less than total number of jobs and routines found
in SourceDSXDir and less than 10.
-y Force a "Yes" answer to overwrite existing <CombinedDSXFile>
file prompt.
-h This help.
Notes: It is assumed that each DSXFile only has one Job or Routine.
ENDOFBLURB
}
##############################################################################
sub Now
{
my ($InFormat) = @_;
my $RetVal = "";
my ($Seconds, $Minutes, $Hours, $Day, $MonthNumber, $YearNumber, $WeekDayNumber, $DayOfYear, $IsDayLightSavings) = localtime(time);
my $Year = $YearNumber + 1900;
my $Month = sprintf("%02d", $MonthNumber + 1);
$Day = sprintf("%02d", $Day);
$Hours = sprintf("%02d", $Hours);
$Minutes = sprintf("%02d", $Minutes);
$Seconds = sprintf("%02d", $Seconds);
if ($InFormat eq "YYYYMMDD") { $RetVal = "$Year$Month$Day"; }
elsif ($InFormat eq "YYYY-MM-DD") { $RetVal = "$Year-$Month-$Day"; }
elsif ($InFormat eq "DDMMYYYY") { $RetVal = "$Day$Month$Year"; }
elsif ($InFormat eq "DD-MM-YYYY") { $RetVal = "$Day-$Month-$Year"; }
elsif ($InFormat eq "YYYYMMDD.HH24MISS") { $RetVal = "$Year$Month$Day.$Hours$Minutes$Seconds"; }
else { $RetVal = "$Year-$Month-$Day $Hours:$Minutes:$Seconds"; }
return $RetVal;
}
##############################################################################
sub ErrorMessage
{
my ($MessageLine) = @_;
print Now()." $MessageLine\n";
print Now()." Type CatDSX.pl -h for help.\n";
}
##############################################################################
sub ValidSplitOption
{
my ($opt_s) = @_;
my $RetVal = $cFalse;
if ( $opt_s ) {
#-- Option specified
if ( $opt_s < 10 ) {
$RetVal = $cTrue;
$gNumberOfCombinedDSXFiles = $opt_s;
}
else {
print Now()." Error: NumberOfCombinedDSXFiles option (-s) must be less than 10.\n";
}
}
else {
#-- Option not specified, assume 1
$RetVal = $cTrue;
$gNumberOfCombinedDSXFiles = 1;
}
return $RetVal;
}
##############################################################################
sub BuildControlList
{
my $RetVal = $cTrue;
sub wanted
{
my $DirectoryName;
my $FileName;
my $FileSize;
$File::Find::prune = !$gRecursive;
if ( -f $File::Find::name ) {
$DirectoryName = dirname($File::Find::name);
$FileName = basename($File::Find::name);
$FileSize = -s $File::Find::name;
push(@gDSXFileList, [$File::Find::name, $DirectoryName, $FileName, $FileSize, 1, "X"]);
}
}
#-- Can't determine if find returns anything useful
find(\&wanted, $gSourceDSXDir);
$RetVal = $#gDSXFileList + 1;
#-- Return number of files found
return $RetVal;
}
##############################################################################
sub AssignJobType
{
my ($NumberOfFiles) = @_;
my $RetVal = $cTrue;
my $x = 0;
my $JobCounter = 0;
my $RoutineCounter = 0;
#-- Spin through list of DSX files
for ($x = 0; $x < $NumberOfFiles; $x++) {
$JobCounter = 0;
$RoutineCounter = 0;
if ( open fhDSXFile, "<".$gDSXFileList[$x][0] ) {
while (<fhDSXFile>) {
chop;
if ($_ =~ /^BEGIN DSJOB/) {
$JobCounter++;
}
if ($_ =~ /^BEGIN DSROUTINES/) {
$RoutineCounter++;
}
}
close fhDSXFile;
#-- Update DSX file array
if ( ($JobCounter + $RoutineCounter) == 1 ) {
#-- This DSX file has only one job or routine
if ( $JobCounter == 1 ) {
$gDSXFileList[$x][5] = "J";
}
else {
$gDSXFileList[$x][5] = "R";
}
}
else {
print Now()." Error: $gDSXFileList[$x][0] has $JobCounter jobs and $RoutineCounter routines.\n";
$RetVal = $cFalse;
}
}
else {
print Now()." Error: Can't open $gDSXFileList[$x][0]\n";
$RetVal = $cFalse;
last
}
}
return $RetVal;
}
##############################################################################
sub SortBySize
{
#-- Bubble sort by size
my ($NumberOfFiles) = @_;
my $x = 0;
my $y = 0;
my $FQName = "";
my $DirName = "";
my $FileName = "";
my $FileSize = "";
my $FileNumber = "";
my $JobType = "";
for ($x = 0; $x < $NumberOfFiles - 1; $x++) {
for ($y = $x+1; $y <= $NumberOfFiles - 1 ; $y++) {
if ( $gDSXFileList[$y][3] > $gDSXFileList[$x][3] ) {
#-- Swap rows
$FQName = $gDSXFileList[$x][0];
$DirName = $gDSXFileList[$x][1];
$FileName = $gDSXFileList[$x][2];
$FileSize = $gDSXFileList[$x][3];
$FileNumber = $gDSXFileList[$x][4];
$JobType = $gDSXFileList[$x][5];
$gDSXFileList[$x][0] = $gDSXFileList[$y][0];
$gDSXFileList[$x][1] = $gDSXFileList[$y][1];
$gDSXFileList[$x][2] = $gDSXFileList[$y][2];
$gDSXFileList[$x][3] = $gDSXFileList[$y][3];
$gDSXFileList[$x][4] = $gDSXFileList[$y][4];
$gDSXFileList[$x][5] = $gDSXFileList[$y][5];
$gDSXFileList[$y][0] = $FQName;
$gDSXFileList[$y][1] = $DirName;
$gDSXFileList[$y][2] = $FileName;
$gDSXFileList[$y][3] = $FileSize;
$gDSXFileList[$y][4] = $FileNumber;
$gDSXFileList[$y][5] = $JobType;
}
}
}
}
##############################################################################
sub AssignTargetDSXFiles
{
my $RetVal = $cFalse;
my ($NumberOfFiles) = @_;
my $TargetFileNumber = 1;
my $x = 0;
if ( $NumberOfFiles >= $gNumberOfCombinedDSXFiles ) {
$RetVal = $cTrue;
if ( $gNumberOfCombinedDSXFiles > 1 ) {
for ($x = 0; $x < $NumberOfFiles; $x++) {
$gDSXFileList[$x][4] = $TargetFileNumber;
$TargetFileNumber++;
if ( $TargetFileNumber > $gNumberOfCombinedDSXFiles ) {
$TargetFileNumber = 1;
}
}
}
}
else {
print Now()." Error: There are fewer DSX Files to process than the NumberOfCombinedDSXFiles option (-s).\n";
}
return $RetVal;
}
##############################################################################
sub SortByTargetFile
{
#-- Bubble sort by TargetFile, JobType, Name
my ($NumberOfFiles) = @_;
my $x = 0;
my $y = 0;
my $FQName = "";
my $DirName = "";
my $FileName = "";
my $FileSize = "";
my $FileNumber = "";
my $JobType = "";
for ($x = 0; $x < $NumberOfFiles - 1; $x++) {
for ($y = $x+1; $y <= $NumberOfFiles - 1 ; $y++) {
if ( ($gDSXFileList[$y][4].$gDSXFileList[$y][5].$gDSXFileList[$y][2]) lt ($gDSXFileList[$x][4].$gDSXFileList[$x][5].$gDSXFileList[$x][2]) ) {
#-- Swap rows
$FQName = $gDSXFileList[$x][0];
$DirName = $gDSXFileList[$x][1];
$FileName = $gDSXFileList[$x][2];
$FileSize = $gDSXFileList[$x][3];
$FileNumber = $gDSXFileList[$x][4];
$JobType = $gDSXFileList[$x][5];
$gDSXFileList[$x][0] = $gDSXFileList[$y][0];
$gDSXFileList[$x][1] = $gDSXFileList[$y][1];
$gDSXFileList[$x][2] = $gDSXFileList[$y][2];
$gDSXFileList[$x][3] = $gDSXFileList[$y][3];
$gDSXFileList[$x][4] = $gDSXFileList[$y][4];
$gDSXFileList[$x][5] = $gDSXFileList[$y][5];
$gDSXFileList[$y][0] = $FQName;
$gDSXFileList[$y][1] = $DirName;
$gDSXFileList[$y][2] = $FileName;
$gDSXFileList[$y][3] = $FileSize;
$gDSXFileList[$y][4] = $FileNumber;
$gDSXFileList[$y][5] = $JobType;
}
}
}
}
##############################################################################
sub OKToOverWriteOutputFile
{
my ($FileName, $DirectoryName, $SuffixName, $opt_y) = @_;
my $RetVal = $cFalse;
my $FirstOutputFile = "";
if ( $gNumberOfCombinedDSXFiles > 1 ) {
$FirstOutputFile = $DirectoryName.$FileName."-Part1".$SuffixName;
}
else {
$FirstOutputFile = $DirectoryName.$FileName.$SuffixName;
}
if ( -e $FirstOutputFile ) {
if ( $opt_y ) {
print Now()." *** Warning: $FirstOutputFile file already exists. Overwriting anyway.\n";
$RetVal = $cTrue;
}
else {
print Now()." *** Warning: $FirstOutputFile file already exists.\n";
print "Proceed anyway? [y|n] ";
$Ans = <STDIN>;
chomp($Ans) if ($Ans);
if ( "$Ans" eq "Y" || "$Ans" eq "y" ) {
$RetVal = $cTrue;
}
else {
print Now()." Aborting.\n";
}
}
}
else {
$RetVal = $cTrue;
}
return $RetVal;
}
##############################################################################
sub WriteDSXHeader
{
my ($fhOutputFile) = @_;
print $fhOutputFile "BEGIN HEADER\n";
print $fhOutputFile " CharacterSet \"ENGLISH\"\n";
print $fhOutputFile " ExportingTool \"Ardent DataStage Export\"\n";
print $fhOutputFile " ToolVersion \"3\"\n";
print $fhOutputFile " ServerName \"$cStandardServerName\"\n";
print $fhOutputFile " ToolInstanceID \"$cStandardToolInstanceID\"\n";
print $fhOutputFile " MDISVersion \"1.0\"\n";
print $fhOutputFile " Date \"$cStandardDate\"\n";
print $fhOutputFile " Time \"$cStandardTime\"\n";
print $fhOutputFile "END HEADER\n";
}
##############################################################################
sub CreateOutputFile
{
my ($OutputFile, $TargetFileNumber, $NumberOfFiles) = @_;
my $RetVal = $cTrue;
my $x = 0;
my $IsPastHeader = $cFalse;
my $IsPastRoutineHeader = $cFalse;
my $LastJobType = "X";
#-- Open output file
if ( open(fhOutputFile, ">$OutputFile" ) ) {
WriteDSXHeader(\*fhOutputFile);
#-- Spin through DSX File array
for ($x = 0; $x < $NumberOfFiles; $x++) {
#-- See if this DSX File in array is targeted for this output file
if ( $gDSXFileList[$x][4] == $TargetFileNumber ) {
#-- This file is targeted to this output file
#-- See if we are processing the first routine in this output file set
if ( $gDSXFileList[$x][5] eq "R" && $LastJobType ne "R" ) {
#-- Must be the first routine
#-- Write out Routine header
print fhOutputFile "BEGIN DSROUTINES\n";
}
#-- Open DSX input file
$IsPastHeader = $cFalse;
$IsPastRoutineHeader = $cFalse;
if ( open fhDSXFile, "<".$gDSXFileList[$x][0] ) {
#-- Spin through source DSX file
while (<fhDSXFile>) {
if ( $IsPastHeader ) {
#-- Filter out routine headers and footers from source DSX file
if ( !(($_ =~ /^BEGIN DSROUTINES/) || ($_ =~ /^END DSROUTINES/)) ) {
#-- Not a routine header or footer
print fhOutputFile $_;
}
}
else {
#-- Spin past header
if ( $_ =~ /^END HEADER/ ) {
$IsPastHeader = $cTrue;
}
}
}
close fhDSXFile;
}
else {
print Now()." Error: Cannot open $gDSXFileList[$x][0].\n";
$RetVal = $cFalse;
}
$LastJobType = $gDSXFileList[$x][5];
}
}
#-- See if the last source DSX file was a routine
if ( $LastJobType eq "R" ) {
#-- Must be a routine
#-- Write out Routine footer
print fhOutputFile "END DSROUTINES\n";
}
close fhOutputFile;
}
else {
print Now()." Error: Cannot create $OutputFile.\n";
$RetVal = $cFalse;
}
return $RetVal;
}
##############################################################################
sub OutputFileProcess
{
my ($NumberOfFiles, $opt_y) = @_;
my $RetVal = $cFalse;
my ($FileName, $DirectoryName, $SuffixName) = fileparse($gCombinedDSXFile, '\.dsx');
my $x = 1;
my $OutputFile = "";
#-- See if output directory exists
if ( -d $DirectoryName ) {
#-- Output directory exists
if ( OKToOverWriteOutputFile($FileName, $DirectoryName, $SuffixName, $opt_y) ) {
#-- Write output file(s)
#-- Spin through output files
for ($x = 1; $x <= $gNumberOfCombinedDSXFiles; $x++) {
if ( $gNumberOfCombinedDSXFiles == 1 ) {
#-- Create one big file
$OutputFile = $DirectoryName.$FileName.$SuffixName;
}
else {
#-- Split output across multiple files
$OutputFile = $DirectoryName.$FileName."-Part".$x.$SuffixName;
}
print Now()." Creating: $OutputFile...\n";
if ( CreateOutputFile($OutputFile, $x, $NumberOfFiles) ) {
$RetVal = $cTrue;
}
else {
last;
}
}
}
}
else {
print Now()." Error: $DirectoryName does not exist.\n";
}
return $RetVal;
}
##############################################################################
sub MainProcess
{
my ($opt_y) = @_;
my $RetVal = $cFalse;
my $NumberOfFiles = 0;
#-- Create Control Array
print Now()." Gathering list of DSX files...\n";
$NumberOfFiles = BuildControlList();
if ( $NumberOfFiles > 0 ) {
#-- Found some files to process
#-- Spin through list and determine JobType
print Now()." Determining job types...\n";
if ( AssignJobType($NumberOfFiles) ) {
#-- Sort Array by size only
print Now()." Sorting DSX file list by Size...\n";
SortBySize($NumberOfFiles);
#-- Assign target DSXFiles
print Now()." Assigning jobs to target output files...\n";
if ( AssignTargetDSXFiles($NumberOfFiles) ) {
#-- Sort Array by TargetFile, JobType, Name
print Now()." Sorting DSX file list by TargetFile, JobType, Name...\n";
SortByTargetFile($NumberOfFiles);
#-- Control the process of creating the outupt files
if ( OutputFileProcess($NumberOfFiles, $opt_y) ) {
$RetVal = $cTrue;
}
else {
ErrorMessage("Aborting: Cannot create Output files.");
}
}
else {
ErrorMessage("Aborting: Cannot properly assign target output files.");
}
}
else {
ErrorMessage("Aborting: One or more DSX files is invalid.");
}
}
else {
ErrorMessage("Aborting: No files to process in $gSourceDSXDir.");
}
return $RetVal;
}
##############################################################################
#-- Main
#-- Global Constants
$cTrue = 1;
$cFalse = 0;
$cOSSuccess = 0;
$cOSFailure = 1;
$cStandardDate = "2001-01-01";
$cStandardTime = "01.00.00";
$cStandardServerName = "ServerName";
$cStandardToolInstanceID = "ToolInstanceID";
#-- Global variables
$gCombinedDSXFile = "";
$gSourceDSXDir = "";
$gNumberOfCombinedDSXFiles = 1;
$gRecursive = $cFalse;
@gDSXFileList = ();
#-- Local variables
my $NumArgs = 0;
my $OSRetVal = $cOSSuccess;
print Now()." Initialization...\n";
if ( getopts('rs:yh') ) {
if ( ! $opt_h ) {
$NumArgs = scalar(@ARGV);
if ( $NumArgs == 1 || $NumArgs == 2 ) {
$gCombinedDSXFile = $ARGV[0];
if ( $NumArgs == 2 ) {
$gSourceDSXDir = $ARGV[1];
}
else {
$gSourceDSXDir = cwd();
}
if ( $opt_r ) {
$gRecursive = $cTrue;
print Now()." Recursively combining all DSX files found in $gSourceDSXDir...\n";
}
else {
print Now()." Combining all DSX files found in $gSourceDSXDir...\n";
}
if ( ValidSplitOption($opt_s) ) {
if ( $gNumberOfCombinedDSXFiles > 1 ) {
print Now()." Splitting SourceDSXFiles across $gNumberOfCombinedDSXFiles CombinedDSXFiles...\n";
}
#-- All input gathered
#-- Do main processing
if ( MainProcess($opt_y) ) {
print Now()." Complete.\n";
}
else {
$OSRetVal = $cOSFailure;
}
}
else {
$OSRetVal = $cOSFailure;
ErrorMessage("Aborting: Invalid split (-s) option.");
}
}
else {
$OSRetVal = $cOSFailure;
ErrorMessage("Aborting: Missing ParameterFile or too many parameters.");
}
}
else {
$OSRetVal = $cOSFailure;
ShowBlurb();
}
}
else {
$OSRetVal = $cOSFailure;
ErrorMessage("Aborting: Invalid options.");
}
exit $OSRetVal;
Enjoy,
-Steve