Page 1 of 1

Sequential File stage need to select latest File to Extract

Posted: Tue Oct 18, 2005 8:50 pm
by vbeeram
Hi,

My job has to run everyday to extract data from a Flat File(files will be on the UNIX box) .

There should be multiple files,but my sequentila file has to pick latest one.
Files are named with some text + timestamp.
All the files will have name and will be differentiated with timestamp.

EX:
1) WalmartProduct10182005...... .txt
2)WalmartProduct10172005...... .txt
Here Sequential file has to pick first job based on timestamp(latest)


But how Sequential fils can identify the latest file?

Any ideas?

Thanks in advance
Beeram

Posted: Tue Oct 18, 2005 9:14 pm
by ray.wurlod
Run a UNIX command (in an Execute Command activity) to determine the latest. For example

Code: Select all

ls -t1 | head -1
(your ls may be a little different).

Use the output from this to supply a job parameter with the file name.

Posted: Tue Oct 18, 2005 9:17 pm
by rleishman
Beeram,

I think you will have to do it with a Unix command. You can call the Unix command from a before-job-subroutine, or from a Shell Exec activity that precedes the Job Activity in a Job Sequence.

Two options for the Unix command are:
1. Find the lastest file and move it to a static file name.
2. Find the latest file and link it to a static file name.

One possible implementation of option 2 is:

Code: Select all

ln -fs `/bin/ls -tr WalmartProduct*.txt | head -1` WalmartProduct.txt 
Edited: ... or you could do what Ray said... :)

Posted: Tue Oct 18, 2005 10:38 pm
by kumar_s
Hi,
But i guess the funda is to know how to get the latest file name.
It can be some what like finding the last date of the specified month and subtractig the date one by one until a file is found on that date.

Regards
kumar

Posted: Tue Oct 18, 2005 10:45 pm
by ray.wurlod
Either of the above ls commands will give you the latest, whatever it is. Learn some UNIX. The -t option for ls sorts by date/time (by default date/time modified, but you can change the default with -c or -u options).

If you want to filter further on date/time, you could use the find command.

Posted: Tue Oct 18, 2005 11:42 pm
by apraman
I think his requirement is to find the latest file based on the timestamp included with in the filename. :)

If the 'filename' consists the timestamp with the format YYYYMMDDHHMMSS, would have been easy, considering the first part ie alphabetic part must remain same.

Code: Select all

ls -r WalmartProduct*.txt | head -1
Need a Unix Guru to help to get latest filename consisting timestamp of fromat MMDDYYYYHHMMSS

Any help?

Posted: Tue Oct 18, 2005 11:44 pm
by ray.wurlod
Just add the timestamp into the wildcard.

Code: Select all

ls -t1 *20051018134500*.txt

Posted: Wed Oct 19, 2005 12:44 am
by clshore
You can use the ls -1t method, but some folks have the unfortunate habit when viewing files, of writing a new version (shift ZZ, you know who you are), which resets the UNIX file timestamp.

You will be lucky to find out about one of these; more often the wrong file is silently processed, the details buried in a log somewhere.

I have found it safer to use the date/time in the filename created by the source process. It takes a more deliberate action to alter the filename.

If your filenames are consistently created with names like this:
WalmartProduct10182005...... .txt

you could use something like this:

ls -1 WalmartProduct*.txt |
sort -rn -k 1.19,1.22 -k 1.15,1.18 |
head -1

to get the most recent one by the date embedded in the name.

Carter

Posted: Wed Oct 19, 2005 8:39 am
by vbeeram
My DataStage server is on NT and Files are on UNIX Box,
So these files are external sources to Datastage Server.

In this situation can i pass UNIX Command output to job parameter?




Thanks
Beeram

Posted: Wed Oct 19, 2005 5:10 pm
by ray.wurlod
My DataStage server is on NT and Files are on UNIX Box,
So these files are external sources to Datastage Server.
Ah, new information not in original post. :roll:

(Notes that original post specified UNIX as the platform.)

Yes, you can pass values to job parameters from job control code. For example a job sequence that includes a loop (StartLoop and EndLoop activities). Or roll-your-own job control code for versions earlier than 7.5.

It is necessary to execute the ls command on the UNIX machine, using some form of remote shell.

It is also necessary that the DataStage jobs can "see" the UNIX files - presumably you have samba or something similar in place to facilitate this.