How can DS listen/loop on a directory for coming files?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
alraaayeq
Participant
Posts: 35
Joined: Sun Apr 04, 2004 5:57 am
Location: Riyadh,Saudi Arabia

How can DS listen/loop on a directory for coming files?

Post by alraaayeq »

Hi all,

Is it possible for DS to loop or listen for new files coming though FTP to a given directory and pick only the successful FTPed files (i.e. should not pick not fully transferred files )

Personally, I used to use C++ code that do listen and trigger the DS job for every files.

The major issue "for me" is how I can use a timeout or whatever in order to determine if the file is successfully landed or not where I'll ignore it and continue looking for others files.


any comments please!!
davidnemirovsky
Participant
Posts: 85
Joined: Fri Jun 04, 2004 2:30 am
Location: Melbourne, Australia
Contact:

Post by davidnemirovsky »

What you need is a control file for each file you are looking for. Once a file is FTPed (file1.csv) another control file is FTPed (file1.ctl) to indicate the transfer has completed. In the control file you could even have a row count to check if the correct number of records have been FTPed across.
Cheers,
Dave Nemirovsky
adamski
Charter Member
Charter Member
Posts: 54
Joined: Thu Mar 20, 2003 5:02 pm

Post by adamski »

We have used both the control file method and a delay that analyses the timestamp.

Read the files timestamp, wait a pre-determined amount of time. Wake up and read it again. If it has not changed, assume the file has landed, and then compare the row count with the control file.
alraaayeq
Participant
Posts: 35
Joined: Sun Apr 04, 2004 5:57 am
Location: Riyadh,Saudi Arabia

Post by alraaayeq »

adamski wrote:We have used both the control file method and a delay that analyses the timestamp.

Read the files timestamp, wait a pre-determined amount of time. Wake up and read it again. If it has not changed, assume the file has landed, and then compare the row count with the control file.
yes, "almost" I did what you said by using C++ code as I mentioned , but can I use DS to do it ?is it by using BASIC language?


many thanks
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

alraaayeq wrote:yes, "almost" I did what you said by using C++ code as I mentioned , but can I use DS to do it ?is it by using BASIC language?
I asked this on the course, and was rather astonished that the tutor didn't think it was possible to process unknown file names. It isn't unusual to have to process files that are named with some kind of incrementing sequence, maybe with a date and time included in the name, but I can't see how DataStage could do this. Any suggestions?

In general, I would recommend the control-file approach over polling for timestamp changes.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sure, it's possible. It's a simple matter to issue a 'dir' or 'ls', whether you match a regular expression as part of it or look for all files, and capture the output. A call to DSExecute will do that for you. Then you can loop thru the output and do what is needed - check the size, run a job with that filename, whatever.

Agreed on the control file. We call it a 'semaphore' file and I've used it for years at multiple sites. Just make sure the people sending the files understand they need to send the control file last. :roll: :lol:

The 'polling for changes' approach can be... problematical.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Here's one (it only takes one pass, but is executed by an external job running the loop - you could wrap the loop around this code if you preferred).

Code: Select all

FUNCTION FilesInDirectory(Directory, WildCard, OutputFile)

* History (most recent first)
*    Date     Programmer       Version  Details of Modification
* ----------  ---------------  -------  -------------------------------------
* 27/09/2003  Ray Wurlod        2.0.0   Initial coding
*


$INCLUDE UNIVERSE.INCLUDE FILEINFO.H
      DEFFUN OpenTextFile(FileName, OpenMode, AppendMode, Logging) Calling "DSU.OpenTextFile"


* The following token can be defined to restrict the code to handling directories only.
* See comments on General tab.

$UNDEFINE CheckingForDirectoryOnly


      * Take copy of argument so as to avoid side effects if changing value.
      argWildcard = Wildcard


      * Open output file for writing, overwriting if it exists
      If Len(OutputFile)
      Then
         Output.fvar = OpenTextFile((OutputFile), "W", "O", "Y")
         Reporting = FileInfo(Output.fvar, FINFO$IS.FILEVAR)
      End


      * Substitute generic wildcard if none provided.  Handle multiple and asterisk wildcards.

      If argWildcard = "" Then argWildcard = "..."
      Convert "~" To @VM In argWildcard
      argWildcard = Ereplace(argWildcard, "*", "...", -1, 0)


      * Open the directory as if it were a table.

      OpenPath DirectoryPath To Directory.fvar
      On Error

         Ans = -Abs(Status())

      End
      Then

$IFDEF CheckingForDirectoryOnly

         FileType = Status()
         If FileType = 19 Or FileType = 1
         Then

$ENDIF

            * Establish Select List #9 as a sorted list of file names in the directory.

            ClearSelect 9
            SSelect Directory.fvar To 9  ; * SSelect generates sorted list


            * Initialize count of file names in directory.

            Ans = 0


            * For each file name increment answer if file name matches desired pattern.

            Loop
            While ReadNext FileName From 9

               If FileName Matches argWildcard
               Then

                  Ans += 1

                  If Reporting
                  Then
                     WriteSeq FileName To Output.fvar Else NULL
                  End

               End

            Repeat

            If Reporting
            Then
               CloseSeq Output.fvar
            End

$IFDEF CheckingForDirectoryOnly

         End
         Else

            Ans = -99                    ; * pathname is not that of a directory

         End
$ENDIF


         * Close the directory to free resources, as file unit no longer needed.

         Close Directory.fvar

      End
      Else

         Ans = -Abs(Status())

      End  

RETURN(Ans)
From the General tab:
Returns -1 if the directory does not exist or could not be opened.
Note that no check is made to verify that the directory pathname is, indeed, that of a directory, so this routine could work just as effectively with hashed files or B-tree files.
To check whether the file is a directory, check the value returned by STATUS() within the THEN clause of the OPENPATH statement. This will be 19 or 1 if the pathname is that of a directory.
Code to handle this check has been included, but is disabled. It may be enabled by defining the token called CheckingForDirectoryOnly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
davidnemirovsky
Participant
Posts: 85
Joined: Fri Jun 04, 2004 2:30 am
Location: Melbourne, Australia
Contact:

Post by davidnemirovsky »

In reply to Phil Hibbs post:
I asked this on the course, and was rather astonished that the tutor didn't think it was possible to process unknown file names. It isn't unusual to have to process files that are named with some kind of incrementing sequence, maybe with a date and time included in the name, but I can't see how DataStage could do this. Any suggestions?
I'm not sure where you did your course or who took the course but obviously it wasn't Ray!
Cheers,
Dave Nemirovsky
alraaayeq
Participant
Posts: 35
Joined: Sun Apr 04, 2004 5:57 am
Location: Riyadh,Saudi Arabia

Post by alraaayeq »

chulett wrote:...
Agreed on the control file. We call it a 'semaphore' file and I've used it for years at multiple sites. Just make sure the people sending the files understand they need to send the control file last. :roll: :lol:
yah, it works when other people (where files come form) ready to help or participate with you :roll:

chulett wrote:...
The 'polling for changes' approach can be... problematical.
Aah, I hate using it, but sometime you do not have any other choices specially when many other departments and legacy system are involved. :x
alraaayeq
Participant
Posts: 35
Joined: Sun Apr 04, 2004 5:57 am
Location: Riyadh,Saudi Arabia

Post by alraaayeq »

ray.wurlod wrote:Here's one (it only takes one pass, but is executed by an external job running the loop - you could wrap the loop around this code if you preferred).

Many thanks for your code, I will try to test it ASAP.
alraaayeq
Participant
Posts: 35
Joined: Sun Apr 04, 2004 5:57 am
Location: Riyadh,Saudi Arabia

Post by alraaayeq »

I found it better to use two loops

1- outer loop that is infinite loop and has
2- inner loop that every time get the list of coming files

please see how to get the list of files in a directory here
viewtopic.php?t=90528&highlight=
Post Reply