Page 1 of 1

How can DS listen/loop on a directory for coming files?

Posted: Mon Dec 06, 2004 11:30 pm
by alraaayeq
Hi all,

Is it possible for DS to loop or listen for new files coming though FTP to a given directory and pick only the successful FTPed files (i.e. should not pick not fully transferred files )

Personally, I used to use C++ code that do listen and trigger the DS job for every files.

The major issue "for me" is how I can use a timeout or whatever in order to determine if the file is successfully landed or not where I'll ignore it and continue looking for others files.


any comments please!!

Posted: Tue Dec 07, 2004 1:26 am
by davidnemirovsky
What you need is a control file for each file you are looking for. Once a file is FTPed (file1.csv) another control file is FTPed (file1.ctl) to indicate the transfer has completed. In the control file you could even have a row count to check if the correct number of records have been FTPed across.

Posted: Tue Dec 07, 2004 2:14 am
by adamski
We have used both the control file method and a delay that analyses the timestamp.

Read the files timestamp, wait a pre-determined amount of time. Wake up and read it again. If it has not changed, assume the file has landed, and then compare the row count with the control file.

Posted: Tue Dec 07, 2004 4:01 am
by alraaayeq
adamski wrote:We have used both the control file method and a delay that analyses the timestamp.

Read the files timestamp, wait a pre-determined amount of time. Wake up and read it again. If it has not changed, assume the file has landed, and then compare the row count with the control file.
yes, "almost" I did what you said by using C++ code as I mentioned , but can I use DS to do it ?is it by using BASIC language?


many thanks

Posted: Tue Dec 07, 2004 4:35 am
by PhilHibbs
alraaayeq wrote:yes, "almost" I did what you said by using C++ code as I mentioned , but can I use DS to do it ?is it by using BASIC language?
I asked this on the course, and was rather astonished that the tutor didn't think it was possible to process unknown file names. It isn't unusual to have to process files that are named with some kind of incrementing sequence, maybe with a date and time included in the name, but I can't see how DataStage could do this. Any suggestions?

In general, I would recommend the control-file approach over polling for timestamp changes.

Posted: Tue Dec 07, 2004 8:43 am
by chulett
Sure, it's possible. It's a simple matter to issue a 'dir' or 'ls', whether you match a regular expression as part of it or look for all files, and capture the output. A call to DSExecute will do that for you. Then you can loop thru the output and do what is needed - check the size, run a job with that filename, whatever.

Agreed on the control file. We call it a 'semaphore' file and I've used it for years at multiple sites. Just make sure the people sending the files understand they need to send the control file last. :roll: :lol:

The 'polling for changes' approach can be... problematical.

Posted: Tue Dec 07, 2004 3:08 pm
by ray.wurlod
Here's one (it only takes one pass, but is executed by an external job running the loop - you could wrap the loop around this code if you preferred).

Code: Select all

FUNCTION FilesInDirectory(Directory, WildCard, OutputFile)

* History (most recent first)
*    Date     Programmer       Version  Details of Modification
* ----------  ---------------  -------  -------------------------------------
* 27/09/2003  Ray Wurlod        2.0.0   Initial coding
*


$INCLUDE UNIVERSE.INCLUDE FILEINFO.H
      DEFFUN OpenTextFile(FileName, OpenMode, AppendMode, Logging) Calling "DSU.OpenTextFile"


* The following token can be defined to restrict the code to handling directories only.
* See comments on General tab.

$UNDEFINE CheckingForDirectoryOnly


      * Take copy of argument so as to avoid side effects if changing value.
      argWildcard = Wildcard


      * Open output file for writing, overwriting if it exists
      If Len(OutputFile)
      Then
         Output.fvar = OpenTextFile((OutputFile), "W", "O", "Y")
         Reporting = FileInfo(Output.fvar, FINFO$IS.FILEVAR)
      End


      * Substitute generic wildcard if none provided.  Handle multiple and asterisk wildcards.

      If argWildcard = "" Then argWildcard = "..."
      Convert "~" To @VM In argWildcard
      argWildcard = Ereplace(argWildcard, "*", "...", -1, 0)


      * Open the directory as if it were a table.

      OpenPath DirectoryPath To Directory.fvar
      On Error

         Ans = -Abs(Status())

      End
      Then

$IFDEF CheckingForDirectoryOnly

         FileType = Status()
         If FileType = 19 Or FileType = 1
         Then

$ENDIF

            * Establish Select List #9 as a sorted list of file names in the directory.

            ClearSelect 9
            SSelect Directory.fvar To 9  ; * SSelect generates sorted list


            * Initialize count of file names in directory.

            Ans = 0


            * For each file name increment answer if file name matches desired pattern.

            Loop
            While ReadNext FileName From 9

               If FileName Matches argWildcard
               Then

                  Ans += 1

                  If Reporting
                  Then
                     WriteSeq FileName To Output.fvar Else NULL
                  End

               End

            Repeat

            If Reporting
            Then
               CloseSeq Output.fvar
            End

$IFDEF CheckingForDirectoryOnly

         End
         Else

            Ans = -99                    ; * pathname is not that of a directory

         End
$ENDIF


         * Close the directory to free resources, as file unit no longer needed.

         Close Directory.fvar

      End
      Else

         Ans = -Abs(Status())

      End  

RETURN(Ans)
From the General tab:
Returns -1 if the directory does not exist or could not be opened.
Note that no check is made to verify that the directory pathname is, indeed, that of a directory, so this routine could work just as effectively with hashed files or B-tree files.
To check whether the file is a directory, check the value returned by STATUS() within the THEN clause of the OPENPATH statement. This will be 19 or 1 if the pathname is that of a directory.
Code to handle this check has been included, but is disabled. It may be enabled by defining the token called CheckingForDirectoryOnly.

Posted: Tue Dec 07, 2004 5:05 pm
by davidnemirovsky
In reply to Phil Hibbs post:
I asked this on the course, and was rather astonished that the tutor didn't think it was possible to process unknown file names. It isn't unusual to have to process files that are named with some kind of incrementing sequence, maybe with a date and time included in the name, but I can't see how DataStage could do this. Any suggestions?
I'm not sure where you did your course or who took the course but obviously it wasn't Ray!

Posted: Tue Dec 07, 2004 11:26 pm
by alraaayeq
chulett wrote:...
Agreed on the control file. We call it a 'semaphore' file and I've used it for years at multiple sites. Just make sure the people sending the files understand they need to send the control file last. :roll: :lol:
yah, it works when other people (where files come form) ready to help or participate with you :roll:

chulett wrote:...
The 'polling for changes' approach can be... problematical.
Aah, I hate using it, but sometime you do not have any other choices specially when many other departments and legacy system are involved. :x

Posted: Tue Dec 07, 2004 11:36 pm
by alraaayeq
ray.wurlod wrote:Here's one (it only takes one pass, but is executed by an external job running the loop - you could wrap the loop around this code if you preferred).

Many thanks for your code, I will try to test it ASAP.

Posted: Wed Jan 05, 2005 12:50 am
by alraaayeq
I found it better to use two loops

1- outer loop that is infinite loop and has
2- inner loop that every time get the list of coming files

please see how to get the list of files in a directory here
viewtopic.php?t=90528&highlight=