Parsing XML and loading Database

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
danddmrs
Premium Member
Premium Member
Posts: 86
Joined: Fri Apr 20, 2007 12:55 pm

Parsing XML and loading Database

Post by danddmrs »

I have a requirement to load information from XML files into a database. The XML files will be generated throughout the day and about 500 files will need to be processed. This process needs to be "real time", as the XML files are created, users anticipate the data will be available on the database.

I plan to use a Folder Stage -> XML Input Stage -> Transformer -> ODBC to load the data but I'm not sure how to start the job each time a file is ready.

Some particulars
The process that creates the XML files is on a different server on the network and does not have the ds client
Each file will be uniquely named
The amount of data in the XML is relatively small

All of the DS projects I've worked on have run in Batch so jobs are either started from the DS scheduler or from our Zeke scheduler. I was thinking of starting the job from command line and passing the file name as a parameter but I'm not sure I can start if from a machine that doesn't have a DS client. Also would need to handle starting multiple sequences at the same time (or maybe skip the sequence and compile job with Multiple Instance option?).

Anyway, looking for some guru guidance before starting.

Thanks in advance for suggestions.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In a sequence check for the existence of any XML file in the receiving directory and, if they are there, process them (Folder stage is good) them move them to a different directory. Repeat.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

I worked with a customer in Denver and they have scheduled a Datastage job in cron to run every 5 mins. It's not real-time but close.

You could also write a unix script to check for the existance of a file within the directory in question, and if there are any files, submit the datastage job to process. This can be a bit tricky as you don't want the job running multiple instances at the same time.

When your job executes, touch an empty file and check to see if it exists. If it does, don't submit the Datastage job. If it does not, touch it and perform a dsjob. Remember in your after job routine to remove the file touched with an ExecSH.

Ray D
danddmrs
Premium Member
Premium Member
Posts: 86
Joined: Fri Apr 20, 2007 12:55 pm

Post by danddmrs »

Thank you for replies.

If I go the Wait For File route my sequence would look something like:

startloop -> waitforfile -> movefiles -> processfiles -> deletefiles - Repeat

I could set up the iterations larger than the expected number of files for a day, start the job before users start processing, and run a job using DSJOB -stop to stop the process at the appropiate time.

Alternatively could a .bat file be excuted from the server creating the XML file to start the DStage process? The specific file name could be passed as a parameter. Something along the lines of:
SET outfile=XMLFileName
call \\LMDSTAGE1T.AAMHC.LOCAL\E$\Ascential\DataStage\Engine\bin\DSJOB -server lmdstage1t -run -jobstatus -param "XMLfile=%outfile%" projectname jobname

Not sure about starting a DS job from a server without a DS Client. From what I've read it would need dsjob.exe and some .dll files to start the job.

Any reasons to prefer one over the other?
danddmrs
Premium Member
Premium Member
Posts: 86
Joined: Fri Apr 20, 2007 12:55 pm

Post by danddmrs »

Solution

Start Loop -> Exec Cmd -> UserVar -> Process -> Nested Cond -> Exec Cmd -> End Loop
Exec Cmd creates a list of the .xml files in the directory
UserVar gets the first file from the list - if no file found uses a dummy file
Process either processes the xml or SLEEPS if the dummy file is passed
Nested Cond bypasses the next Exec Cmd when Dummy
Exec Cmd moves the file out of the directory

Instead of using DataStage we opted to use a WebService to process the xml's but thought I would post solution anyway.
Post Reply