Using sequence job to read multiple files of directory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Using sequence job to read multiple files of directory

Post by BIuser »

Hi there

I have a job that uses an input file parameter - the problem is that the input file parameter will have a different value as the job loops through each of the input files. There are approximately 20 files in a windows directory.

My sequence job has a startloop activity stage and then an Execute Command stage, the execute command stage would normally pass through the value of the input file it is ready from the drive. Unfortunately what is happening now is that when I put the *.csv wild card in for the input file, the job just gives a list of ALL the csv files in the directory - but it does not process the job one after another calling each file in the directory as a separate input file. It just reads ALL the CSV files in the folder (which I want - but not altogether - rather one at a time).

Does this make sense ?

Please assist ... I'm stuck. Thanks a mil ! :-)
Barbara
-------------------------
https://www.ssa.co.za
Rubu
Premium Member
Premium Member
Posts: 82
Joined: Sun Feb 27, 2005 9:09 pm
Location: Bangalore

Re: Using sequence job to read multiple files of directory

Post by Rubu »

Hi Barbara,

If you can use a a "USer Variable" stage followed by the Start Loop activity and execute the command ls -lt|head -1... you should get only one file name from the list of all files. For getting the file name you may have to write a small routine which calls DSExecute command and return the output.

Now the variable can be fed to a job through job activity using job parameter.

Next to "job activity", there can be a Execute_Command stage which copies the file from the source directory to a Temp directory (cp /SourceDir/Variable /Tmp)

End Loop Activity.

If you want to copy back the files to source directory you can add another Execute_Command to execute (cp /Tmp/File* /SourceDir)
Regards
Palas
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right, that UserVariable stage is the key here. It can build a delimited list of 'things' and pass elements from the list, one at a time, to the Start Loop stage. In your case, that list can be a list of all filenames that match your wildcard pattern. Then inside the loop you run your job to process each one, one at a time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Also try executing ls -1 which forces the out put into one-entry-per-line and feed it to the start loop activity marking the "List loop" as its loop type.
Regards,
Madhu Dharmapuri
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

better would be "ls -m", this gives a comma separated list of files in the directory, you can then use the Field command within a User Variable activity to get each file in turn (count number of commas, total number of files is count+1, this can then control your loop, getting filename for your current loop value in turn)

If you call back to the UNIX directory each time to get the next file, you will need to move the processed file out to ensure you don't pick it up again
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, I too would prefer using "ls -m" here. And you wouldn't need to do anything special in the UserVariables stage other than fetch the list of all files, the StartLoop will (when set up properly) automatically iterate through the list and can pass in both one item from the list (in your case, filename) at a time and the current $Counter as well if needed. The loop will continue until the list is exhausted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Post by BIuser »

Hi there guys,

Thanks so much for trying to help me ... but clearly I've lost the plot here. Can someone tell me where does the User Variable stage get the values for the user variable ? When I add it to my sequence job, there are no options for me to add user variables.

I'm looking through my helpfiles ... but can't find anything on user variable stage to start off with !

Thanks
Barbara
-------------------------
https://www.ssa.co.za
BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Post by BIuser »

Hi there

Ok, firstly, I'm passing a user variable for the file name to the job each time the loop iterates. So every time we go through the loop, the name of the file is processed by the job by the user variable. That is done.

HOWEVER, this is still the problem I'm experiencing. My job looks like this:

EXECUTE_COMMAND (Uses wildcard to get files and uses the ls -m parameter in the command syntax ----> This produces a list of files but without a comma ----> StartLoop (this is now looping through a list but looking for a comma which does not exist ! This is the Delimited value I have: #Cmd_TransFile_To_Load.$CommandOutput# (is this correct) ----> then it goes to the job.

Somewhere I'm missing something ... Please don't give up on me now !

Barbara
-------------------------
https://www.ssa.co.za
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

BIuser wrote:When I add it to my sequence job, there are no options for me to add user variables.
It is a little odd that it starts out so... empty, but it is a standard grid widget so simply right-click on it and then select 'New' or 'Insert'. Don't remember which but it should be obvious once you've done it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

BIuser wrote:EXECUTE_COMMAND (Uses wildcard to get files and uses the ls -m parameter in the command syntax ----> This produces a list of files but without a comma
Then it sounds like you're not executing the command properly. If you are using parameters in the Execute Command stage, then they will need to be in the parameter box, like so:

Code: Select all

Command: ls -m
Parameter: #DirectoryToCheck#
If you're not using parameters, then just put the entire command in the command prompt. And check it from your O/S command line first to make sure you've got the syntax correct and are getting the correct results.
BIuser also wrote:#Cmd_TransFile_To_Load.$CommandOutput# (is this correct)
Looks correct, but if you've told it you've got a comma-delimited list and there's no commas, you're not going to get very far.
-craig

"You can never have too many knives" -- Logan Nine Fingers
BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Post by BIuser »

This is the syntax of the command I am executing;

DIR /B/O-D "\\mcbcptfs01\groups\MHD\Data to load\Payroll files\TRANS files\*.csv" ls -m

Is this correct ?
-------------------------
https://www.ssa.co.za
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Post by girija »

No,
This should be ls -m *.csv.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wait... sorry, should have noticed you are on a Windows server so won't have access to UNIX commands like "ls" unless you also have EE installed and thus the MKS Toolkit, or something equivalent. Do you? :?

So for pure DOS solution, the /B option means a "bare" listing, i.e. just the file names, the other puts it in descending order by date. Get rid of the "ls -m" at the end. You won't end up with a comma-delimited list but rather a "return" delimited one unless you specifically convert the returns to commas in the command output. I don't have any DS access now and don't recall what other delimiter options you have in the Start Loop stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In your user variable convert the line terminators (which have by then been translated into field marks) into your preferred delimiter, and perhaps strip the final one.

Code: Select all

Left(Convert(@FM, ",", EC.$CommandOutput), Len(EC.$CommandOutput) - 1)
Refer to the user variable in the "list of things" list in the Start Loop activity.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Post by BIuser »

Hi there

I putting in the code you just sent me ... it has put the commas in between the file names now - BUT when the job runs, it's still processing ALL the files output by the UserVariable Activity Stage as ONE file name instead of splitting them up in the Start Loop Activity. This is what the INPUTFILE parameter looks like when the job runs ...

INPUTFILE=M050 Trans Dec2008.csv,M038 Trans Dec2008.csv,M034 Trans Dec2008.csv,M033 Trans Dec2008.csv,M019 Trans Dec2008.csv,

as you can see ... the commas are there now - YIPPEE - but it still tries to process all of them as ONE file name instead of individual file names.

Thanks
BArbara
-------------------------
https://www.ssa.co.za
Post Reply