Page 1 of 2

Using sequence job to read multiple files of directory

Posted: Wed Mar 18, 2009 4:16 am
by BIuser
Hi there

I have a job that uses an input file parameter - the problem is that the input file parameter will have a different value as the job loops through each of the input files. There are approximately 20 files in a windows directory.

My sequence job has a startloop activity stage and then an Execute Command stage, the execute command stage would normally pass through the value of the input file it is ready from the drive. Unfortunately what is happening now is that when I put the *.csv wild card in for the input file, the job just gives a list of ALL the csv files in the directory - but it does not process the job one after another calling each file in the directory as a separate input file. It just reads ALL the CSV files in the folder (which I want - but not altogether - rather one at a time).

Does this make sense ?

Please assist ... I'm stuck. Thanks a mil ! :-)
Barbara

Re: Using sequence job to read multiple files of directory

Posted: Wed Mar 18, 2009 5:58 am
by Rubu
Hi Barbara,

If you can use a a "USer Variable" stage followed by the Start Loop activity and execute the command ls -lt|head -1... you should get only one file name from the list of all files. For getting the file name you may have to write a small routine which calls DSExecute command and return the output.

Now the variable can be fed to a job through job activity using job parameter.

Next to "job activity", there can be a Execute_Command stage which copies the file from the source directory to a Temp directory (cp /SourceDir/Variable /Tmp)

End Loop Activity.

If you want to copy back the files to source directory you can add another Execute_Command to execute (cp /Tmp/File* /SourceDir)

Posted: Wed Mar 18, 2009 8:36 am
by chulett
Right, that UserVariable stage is the key here. It can build a delimited list of 'things' and pass elements from the list, one at a time, to the Start Loop stage. In your case, that list can be a list of all filenames that match your wildcard pattern. Then inside the loop you run your job to process each one, one at a time.

Posted: Wed Mar 18, 2009 9:47 pm
by dh_Madhu
Also try executing ls -1 which forces the out put into one-entry-per-line and feed it to the start loop activity marking the "List loop" as its loop type.

Posted: Wed Mar 18, 2009 9:58 pm
by Kryt0n
better would be "ls -m", this gives a comma separated list of files in the directory, you can then use the Field command within a User Variable activity to get each file in turn (count number of commas, total number of files is count+1, this can then control your loop, getting filename for your current loop value in turn)

If you call back to the UNIX directory each time to get the next file, you will need to move the processed file out to ensure you don't pick it up again

Posted: Thu Mar 19, 2009 8:10 am
by chulett
Yes, I too would prefer using "ls -m" here. And you wouldn't need to do anything special in the UserVariables stage other than fetch the list of all files, the StartLoop will (when set up properly) automatically iterate through the list and can pass in both one item from the list (in your case, filename) at a time and the current $Counter as well if needed. The loop will continue until the list is exhausted.

Posted: Fri Mar 20, 2009 5:01 am
by BIuser
Hi there guys,

Thanks so much for trying to help me ... but clearly I've lost the plot here. Can someone tell me where does the User Variable stage get the values for the user variable ? When I add it to my sequence job, there are no options for me to add user variables.

I'm looking through my helpfiles ... but can't find anything on user variable stage to start off with !

Thanks
Barbara

Posted: Fri Mar 20, 2009 6:37 am
by BIuser
Hi there

Ok, firstly, I'm passing a user variable for the file name to the job each time the loop iterates. So every time we go through the loop, the name of the file is processed by the job by the user variable. That is done.

HOWEVER, this is still the problem I'm experiencing. My job looks like this:

EXECUTE_COMMAND (Uses wildcard to get files and uses the ls -m parameter in the command syntax ----> This produces a list of files but without a comma ----> StartLoop (this is now looping through a list but looking for a comma which does not exist ! This is the Delimited value I have: #Cmd_TransFile_To_Load.$CommandOutput# (is this correct) ----> then it goes to the job.

Somewhere I'm missing something ... Please don't give up on me now !

Barbara

Posted: Fri Mar 20, 2009 9:07 am
by chulett
BIuser wrote:When I add it to my sequence job, there are no options for me to add user variables.
It is a little odd that it starts out so... empty, but it is a standard grid widget so simply right-click on it and then select 'New' or 'Insert'. Don't remember which but it should be obvious once you've done it.

Posted: Fri Mar 20, 2009 9:17 am
by chulett
BIuser wrote:EXECUTE_COMMAND (Uses wildcard to get files and uses the ls -m parameter in the command syntax ----> This produces a list of files but without a comma
Then it sounds like you're not executing the command properly. If you are using parameters in the Execute Command stage, then they will need to be in the parameter box, like so:

Code: Select all

Command: ls -m
Parameter: #DirectoryToCheck#
If you're not using parameters, then just put the entire command in the command prompt. And check it from your O/S command line first to make sure you've got the syntax correct and are getting the correct results.
BIuser also wrote:#Cmd_TransFile_To_Load.$CommandOutput# (is this correct)
Looks correct, but if you've told it you've got a comma-delimited list and there's no commas, you're not going to get very far.

Posted: Mon Mar 23, 2009 6:21 am
by BIuser
This is the syntax of the command I am executing;

DIR /B/O-D "\\mcbcptfs01\groups\MHD\Data to load\Payroll files\TRANS files\*.csv" ls -m

Is this correct ?

Posted: Mon Mar 23, 2009 9:02 am
by girija
No,
This should be ls -m *.csv.

Posted: Mon Mar 23, 2009 9:50 am
by chulett
Wait... sorry, should have noticed you are on a Windows server so won't have access to UNIX commands like "ls" unless you also have EE installed and thus the MKS Toolkit, or something equivalent. Do you? :?

So for pure DOS solution, the /B option means a "bare" listing, i.e. just the file names, the other puts it in descending order by date. Get rid of the "ls -m" at the end. You won't end up with a comma-delimited list but rather a "return" delimited one unless you specifically convert the returns to commas in the command output. I don't have any DS access now and don't recall what other delimiter options you have in the Start Loop stage.

Posted: Mon Mar 23, 2009 2:42 pm
by ray.wurlod
In your user variable convert the line terminators (which have by then been translated into field marks) into your preferred delimiter, and perhaps strip the final one.

Code: Select all

Left(Convert(@FM, ",", EC.$CommandOutput), Len(EC.$CommandOutput) - 1)
Refer to the user variable in the "list of things" list in the Start Loop activity.

Posted: Tue Mar 24, 2009 3:21 am
by BIuser
Hi there

I putting in the code you just sent me ... it has put the commas in between the file names now - BUT when the job runs, it's still processing ALL the files output by the UserVariable Activity Stage as ONE file name instead of splitting them up in the Start Loop Activity. This is what the INPUTFILE parameter looks like when the job runs ...

INPUTFILE=M050 Trans Dec2008.csv,M038 Trans Dec2008.csv,M034 Trans Dec2008.csv,M033 Trans Dec2008.csv,M019 Trans Dec2008.csv,

as you can see ... the commas are there now - YIPPEE - but it still tries to process all of them as ONE file name instead of individual file names.

Thanks
BArbara