Job Looping

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Christina Lim
Participant
Posts: 74
Joined: Tue Sep 30, 2003 4:25 am
Location: Malaysia

Job Looping

Post by Christina Lim »

Hallo all,

I would need your suggestion on my situation here.'

I have a hash file which contains 2 columns, code and description.

I have another job, Job A which reads from a sequential file n write to different file depending on the code value in the hash file.
Which means, if i have 5 records in the hash file, I would like to loop Job A 5 times and write to 5 different files. I can use job parameter to set the naming of different target sequential files.

But I am not sure how to loop the Job A..

Would appreciate suggestion from you.

Thank you
bchau
Charter Member
Charter Member
Posts: 46
Joined: Tue Jan 18, 2005 7:39 am
Location: Japan
Contact:

Post by bchau »

You could create a new job. Use the job properties|job control tab to set up another job that loops Job A. Refer to the BASIC guide for looping commands.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea: Don't use a loop at all.

Use one job in which the Transformer stage has five outputs, writing to the five different files. Use constraints based on the return value from the lookup to determine which rows are sent along which links.

This way you only need to process your source data once. This will yield the shortest completion time for your processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Christina Lim
Participant
Posts: 74
Joined: Tue Sep 30, 2003 4:25 am
Location: Malaysia

Post by Christina Lim »

Hallo Ray,

But I would only know the number of outputs at runtime, based on the number of records in the hash file.

Can you please explain further..

Thank you very much
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the maximum number of paths you will have is finite and manageably small, create that many output links. It doesn't matter if no rows are passed along any particular output link.

In your plan, you want to process the same data N times, where N is the number of values in the hashed file. This is necessarily inefficient. You should only process data once.

The main problem with an arbitrary number of outputs is that you have to have some method for generating that number of output file names. If the number is not manageably small, then you probably are better off creating a routine that can keep track of which of these files are open, opening more if required, and actually writing to the files from within the routine. An after-job routine should be used to close all the open files.

Code: Select all

FUNCTION WriteNFiles(Line, HashResult)
* Writes Line to file FileHashResult

$INCLUDE UNIVERSE.INCLUDE FILEINFO.H

* Change this constant to be able to handle more files, do not exceed NFILE.
Equate MAXFILES To 100

* File variables are in COMMON so the files are not closed on return.
COMMON /WriteNFiles/FileVar(MAXFILES), FileInUse, FileNames, LineCount(MAXFILES)
If FileInUse = 0 
Then
   FileInUse = ""   ; * dynamic array  
   FileNames = ""  ; * dynamic array
End

* Determine whether file is open already, open it if not.
FileName = "File" : HashResult
Locate FileName In FileNames By "AL" Setting WhereFound
Else
   Ins FileName Before FileNames<WhereFound>
   OpenSeq FileName To FileVar(WhereFound)
   Then
      FileInUse<WhereFound> = 1
      WeofSeq FileVar(WhereFound)  ; * truncate file
   End
End

* If the file is open, then write the line to the file.
If FileInfo(FileVar(WhereFound))
Then
   WriteSeq Line To FileVar(WhereFound)
   Then
      LineCount(WhereFound) += 1
   End
End

Ans = 0

RETURN(Ans)
Some necessary error handling has been omitted for the sake of clarity.

Code: Select all

SUBROUTINE CloseNFiles(InputArg, ErrorCode)
* Close the files opened by WriteNFiles

ErrorCode = 0 ; * set to a non-zero value to stop job execution

COMMON /WriteNFiles/FileVar(MAXFILES), FileInUse, FileNames, LineCount(MAXFILES)

SetRem 0 On FileInUse
WhereFound = 0
Loop
   Remove Flag From FileInUse Setting MoreFlags
While Flag 
   WhereFound += 1
   CloseSeq FileVar(WhereFound)
While MoreFlags
Repeat

RETURN
(Yes, it is permissible to have more than one WHILE statement in a loop.) :idea:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Christina Lim
Participant
Posts: 74
Joined: Tue Sep 30, 2003 4:25 am
Location: Malaysia

Post by Christina Lim »

Hallo Ray,

There are about 30 output files that I would need to create each time. Therefore, it is not feasible to create multiple output links.

I appreciate your solution to this problem. However, as I am not familiar with using BASIC all these while, I don't really understand your solution.

Firstly, how would I call this function , WriteNFiles in the job?
Secondly, wat does this input 'Line' and 'HashResult' in 'WriteNFiles' function comes from? Should HashResult be a hash file ... an array of hash result? How do I get it?

Would it be easier if i append all the results in a file instead of creating multiple files?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

First off, 30 does fall into the "small and manageable" category. I'd be happy to have a Transformer stage with 30 outputs, given that only one of them is triggered for any input row.

WriteNFiles would be called in an output column derivation expression. Argument Line is the line to be written to the file (which you can construct in the same expression) and HashResult is the result of the hashed file lookup, which is an Input Column in the derivation.

If you're appending everything in one file then you're not fulfilling your original design requirement. That's not to say it's an invalid solution, but how then do you split the single output file into 30 files? Are you familiar with awk or sed (either could do it)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Christina Lim
Participant
Posts: 74
Joined: Tue Sep 30, 2003 4:25 am
Location: Malaysia

Post by Christina Lim »

Hallo Ray,

Thank you for your suggestion. I decided to use unix scripting (sed) to append 30 sets of records into a common file and update the category for each set of records at the same time.

Code: Select all

#!/bin/ksh

szPath=$1
szOriginalFile=$2
szClientVarTemplate=$3
szTargetFile=$4

echo "szPath = $szPath"
echo "szOriginalFile = $szOriginalFile"
echo "szTargetFile = $szTargetFile"

if test ! -s $szPath$szTargetFile
then 
   echo "$szPath$szTargetFile is empty or does not exist"
   echo "No deletion of file is done"
else
   echo "$szPath$szTargetFile already exists.Deletion of file is done"
   rm $szPath$szTargetFile
fi

for fieldfile1 in `cat $szPath$szClientVarTemplate`
do
   echo $fieldfile1
   sed "s/EXMT/$fieldfile1/g" $szPath$szOriginalFile >> $szPath$szTargetFile
done 
Post Reply