Monitor network file location for new files

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
alex999
Participant
Posts: 9
Joined: Wed Aug 12, 2015 3:25 am

Monitor network file location for new files

Post by alex999 »

Hi all

We are developing application managing some excel files. Case is that application will produce some excel files and save them in specific location.
We need now to import these files into database with Datastage just after someone save a new file in this specific location. So we need to make Datastage monitor somehow this location all the time and when there is any new file, it should start a job and import the file. Is it possible somehow?
How to configure such type of job?

Thank you for any responses
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

(Assuming that the application creating the excel file cannot drop it on the DataStage/Unix box) Mount/share the source directory on DataStage box. Subsequently use a filewatcher to trigger other activities of your job. Filewatcher can be implemented using your scheduling tool or Wait For File Activity stage.
alex999
Participant
Posts: 9
Joined: Wed Aug 12, 2015 3:25 am

Post by alex999 »

That is a great tip! I haven't use this Wait-For-File activity yet.

Is it possible to configure it to give there path to find ANY new file or it has to be specific file name? In the source directory there will be files like: report_hr_20160501, report_hr_20160601 and one day appear for example report_hr_20160701 and report_hr_20160601_v2, datastage should take only these 2 new reports. Probably we won't know names of these files.
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Wait For File Activity stage looks for a specific filename and does not support wildcards. See this discussion.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And failing the ability to mount them on your ETL server (can't remember the last place I worked where that was allowed to happen) you'll need to look into a script of some kind to do the remote wildcard file existence check. Or if your version supports it, use an FTP source in a job to do basically an "mget" on files that you schedule to run X times a day.

Next question in my mind - how are you planning on processing the native xls / xlsx files? ODBC? Or will they be exporting them as .csv files for you?

You'll also need to mange the whole "only new files" part yourself, which can be fun. Are you allowed to rename the source files? Move them elsewhere on the source system? Delete them when you are done with them? Failing that post-processing ability, you might need to fall back on keeping a list of filenames you've already seen and only allow "new" names thru. Can be handy if they accidentally push the same file to you more than once and you only want to process it the first time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
alex999
Participant
Posts: 9
Joined: Wed Aug 12, 2015 3:25 am

Post by alex999 »

Application can convert files into *.csv. I think it would be even better for DS.

I should be allowed to rename source files. I don't want to rename source xls files but then I convert it into csv they could be renamed.
Files shouldnt be removed from this destination, when they are out of date they should be renamed as old version or something like that - this is web application part. so that is also impossible to push the same file - that would be the same as new version. The application shoud get done the case of moving, converting and renaming.

As there is kind of versioning, i will need some additional columns in database for sure with version, date from, date to and status if it is an current version.

So.. taking this versioning into consideration... DS could find somehow file with new version in name but I dont thing it would be possible..

Or maybe sth like that: every new file wouldn't have any version in name, for example it would always be Report1.csv, when new one comes, Report1 become Report1_v1 and the new one is then Report1, next is coming and we have Report1_v1, Report1_v2 and current Report1...
Datastage should check then always only for Report1...

There should be only one current Report source file.
alex999
Participant
Posts: 9
Joined: Wed Aug 12, 2015 3:25 am

Post by alex999 »

we have developed new alternative idea
if there will be problem with processing xls/csv, we make one more stage where application save raw data,
datastage will extract data from this application database and then import it to the destination source
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no "problem" with processing those files, per se. CSV files should be trivial, native Excel files on a UNIX machine much less so. And the file management could be quite extensive but all of that is perfectly doable.

Your alternative idea sounds much... easier. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
johnboy3
Premium Member
Premium Member
Posts: 52
Joined: Fri Jun 19, 2015 2:48 pm
Location: Jackson, MS, USA

Post by johnboy3 »

At the risk of looking bad, can someone explain his ending, alternative solution? I'm fairly new to DS.

Thanks,
john3
john3
----------------------------------------------------
InfoSphere 8.5.0.2; DataStage 8.5.0.0; OS-RHEL 6.6; DB-Oracle Enterprise Edition 11g (11.2.0.4)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

He just means have the app changed to push the data to staging / interface tables specific to this processing and then have DS source from that database directly, turning it from an Excel Files Goat Rodeo to a much more standard "ETL" process.

Most times you won't have the option of getting the source system changed to be more ETL-friendly, usually you are stuck with whatever they do (or do not) provide and have to suck it up and ride the bull. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply