Processing multiple files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Processing multiple files

Post by arun_im4u »

Hello,

Which would be the best approach to process multiple files of a same pattern in a folder one after the other so that the run time logs for each file can be captured separately.

I tried to use the execute command activity to capture the head file and send it as a parameter to the Job activity but did not work. The other option would be to write scripts outside DS.

Any suggestions would helpful.

Thanks.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Did you try looking at Read Method "File Pattern" in the sequential file stage properties?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
velagapudi_k
Premium Member
Premium Member
Posts: 142
Joined: Mon Jun 27, 2005 5:31 pm
Location: Atlanta GA

Post by velagapudi_k »

Select 'File Pattern' for the read method in Sequential file stage properties. That functionality is pretty good. Logs will be seperate for each file. It allows wild characters. eg: *filename*.
Venkat Velagapudi
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Post by arun_im4u »

Yeah. I did look at file pattern option, but it concatenates all the files of the pattern into one and tries to process it. I would like to process one file after the other.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Use a Start Loop and End Loop in a sequence job. pass the file names as a list to the start loop. Specify #StartLoopName.$Counter# as the derivation for the job parameter. It will take all the file names as a list and pass it to your job. This way your job will process each file individually.
If the filenames are dynamic in the folder. You can have a Basic routine that used DSExecute() to get all the files present in that folder and pass it as a comma delimited list to your StartLoop Activity.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
umamahes
Premium Member
Premium Member
Posts: 110
Joined: Tue Jul 04, 2006 9:08 pm

Post by umamahes »

Make a FileList with set of files You want to Process.In the job sequence user StartLoop Activity and EndLoop Activity to process all the File In the file list.To do this first count the number files in the file list and set this value as upper limit to the StartLoop activity and tehn write a routine to get the file name from the file list.
HI
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does not the items list in a StartLoop activity support limited regular expressions?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
velagapudi_k
Premium Member
Premium Member
Posts: 142
Joined: Mon Jun 27, 2005 5:31 pm
Location: Atlanta GA

Post by velagapudi_k »

No ray. I had a similar problem where I have to process one file after other. So I wrote a routine which executes an operating system command and returns the files in a comma delimited list. I am passing this as the input to loop activity and it works fine. Till now I had maximum of 15 files and my sequence iterated thru 15 times. So I have no problem.
Venkat Velagapudi
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Post by arun_im4u »

I wrote a routine to make it a comma delimited file and pass it as a parameter to the start loop stage. It worked fine. But if there are many files then the routine generates a new line in the output and the job fails.

Code: Select all

*FilePath(Arg1) directory where the file exists and pFilePattern(Arg2) is the pattern to look for

InputArg = 'cd ' : pFilePath : ' ; ls -m ' : trim(pFilePattern)

Call DSExecute("UNIX", InputArg, Output, SystemReturnCode)

If SystemReturnCode<>0 Then
	Call DSLogFatal('GetFilesList routine failed to excute command ' : InputArg : ' with return code ' : SystemReturnCode : ' and with msg ' : Output, 'GetFilesList')
End
Else
      out1 = convert(" ", "",Output)
      out2 = Left(out1,len(out1)-1)
End

Call DSLogInfo("Command Output is " : out2,"GetFilesList")

Ans=out2
Any help would be great,
Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your difficulty is with your operating system's output line length limit. You may be able to modify that. You could certainly adapt your Convert() function to remove the line terminators as well (on UNIX - on Windows you'd need Ereplace()).

Or you could use ls -1 to get a single COLUMN of output, and convert the field mark characters in that to commas. Something like:

Code: Select all

InputArg = ls -1 ' : pFilePath : "/" : trim(pFilePattern) 

Call DSExecute("UNIX", InputArg, Output, SystemReturnCode) 

If SystemReturnCode<>0 Then 
   Call DSLogFatal('GetFilesList routine failed to excute command ' : InputArg : ' with return code ' : SystemReturnCode : ' and with msg ' : Output, 'GetFilesList') 
End 
Else 
      * Build list of non-empty lines
      out1 = Output
      out2 = ""
      Loop
         Remove Element From out1 Setting MoreElements
         If Len(Element) Then out2<1> = Element
      While MoreElements
      Repeat
End 

Ans=Convert(@FM, ",", out2 )
Call DSLogInfo("Command Output is (sort of) " : Ans, "GetFilesList") 
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Post by arun_im4u »

Thanks Ray. It worked fine. I modified it a little bit to serve my purpose. Didn't know what "Element" in the code means.

Code: Select all


      InputArg = "cd "  : pFilePath :"; ls -1 " : trim(pFilePattern)
      Call DSExecute("UNIX", InputArg, Output, SystemReturnCode)
      If SystemReturnCode<>0 Then
         Call DSLogFatal('GetFilesList routine failed to excute command ' : InputArg : ' with return code ' : SystemReturnCode : ' and with msg ' : Output, 'GetFilesList')
      End
      Else
      Print Output
      End

      Out1=Convert(@FM, ",", Output )
      Ans =Left(Out1,len(Out1)-1)
      Call DSLogInfo("Command Output is " : Ans, "GetFilesList")
I also wrote one that worked, but your suggestion is better.

Code: Select all

InputArg = 'cd ' : pFilePath : ' ; ls -m ' : trim(pFilePattern)

Call DSExecute("UNIX", InputArg, Output, SystemReturnCode)

If SystemReturnCode<>0 Then
	Call DSLogFatal('GetFilesList routine failed to excute command ' : InputArg : ' with return code ' : SystemReturnCode : ' and with msg ' : Output, 'GetFilesList')
End
Else

	*out1 = convert(char(10), ",", Output)
	*out2 = Left(out1,len(out1)-1)

	
Out1 = OConv(Output, "MCP")
Out2 = EReplace(Out1, ".csv,.", ".csv,")
Out3 = Left(Out2,len(Out2)-1)
Out4 = Convert(" ","",Out3)

Ans=Out4

End
Call DSLogInfo("Command Output is " : Output,"GetFilesList")
Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Element" is an element in a dynamic array. The output is returned to DataStage as a dynamic array (a field-mark-delimited string). The loop served to remove any empty lines (at the beginning and end, typically, from an ls output). Your solution removes only the one at the end.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply