Multiple of file handling

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Perwezakh
Premium Member
Premium Member
Posts: 38
Joined: Mon Jun 06, 2005 9:13 am
Location: Chicago, IL

Multiple of file handling

Post by Perwezakh »

I have an directory structure like:
Vendor1
in/
out/
Vendor2
in/
out/
Vendor....n

In my shell script I am reading many vendor files, which all are same in format, number of columns and placeing them into "in" folder of that vendor. Now I have a DataStage job which going to consume these vendor files. I have to apply a logic that as soon as I place a vendor file into Vendor1/in folder my DataStage job should kick in and while running of this DataStage instance my scripts put another vendor file in Vendor2/in folder. Now while you see the first instance is already running now I have start another instance of same DataStage job.
Please help me to design this and also hint me how each can differentiate in Director job log.
Thanks in advance
Last edited by Perwezakh on Mon Jul 20, 2009 9:06 am, edited 1 time in total.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would write a sequence that gets the directory contents using an execute command stage inside an outer loop. It would then loop through each file found and check to see if there is already an instance of the multiinstance job running for that filename (I'd use the filename as the instance name). If no such instance exists then start one.
Perwezakh
Premium Member
Premium Member
Posts: 38
Joined: Mon Jun 06, 2005 9:13 am
Location: Chicago, IL

Post by Perwezakh »

ArndW wrote:I would write a sequence that gets the directory contents using an execute command stage inside an outer loop. It would then loop through each file found and check to see if there is already an instance of the multiinstance job running for that filename (I'd use the filename as the instance name). If no such instance exists then start one.
Thanks ArndW, How will you check that a instance is completed and now you don't have to run that file instance?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am assuming that each file name is distinct, so if an instance exists then the job is already running or has completed.
Perwezakh
Premium Member
Premium Member
Posts: 38
Joined: Mon Jun 06, 2005 9:13 am
Location: Chicago, IL

Post by Perwezakh »

ArndW wrote:I am assuming that each file name is distinct, so if an instance exists then the job is already running or has completed.
Thanks ArndW, Yes you are right each file name is distinct. But my question is how will I put this check logic into the loop of sequencer loop to check which file instance run has completed or not?
I appreciate your answers.
Thanks
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Can't you create a new instance for each file ?

If you want to run fixed max number of instances, you can take n files at a time and branch through links in sequencer each calling the job with appropriate invocation id or a dummy link each connecting into a sequence object with ALL.
Perwezakh
Premium Member
Premium Member
Posts: 38
Joined: Mon Jun 06, 2005 9:13 am
Location: Chicago, IL

Post by Perwezakh »

Sainath.Srinivasan wrote:Can't you create a new instance for each file ?

If you want to run fixed max number of instances, you can take n files at a time and branch through links in sequencer each calling the job with appropriate invocation id or a dummy link each connecting into a sequence object with ALL.
Sai, As per my requirement I won't be able to run my DataStage job on fixed number of instance. So again please help me understand how will I check each Venrod/in folder and if there is file then check whether I have already consumed this file if not then start DataStage. And this process I have to keep initiating as soon as I get file in to any Vendor/in folder. That means many intances. If I defined Invocation id into job parameter, how should I pass different file names for different DataStage run.
Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Execute "ls -f /my/path/to/directory". Use the output, which is comma separated, as a list to parse within a loop, extracting one file name per iteration. This filename is passed to a DataStage job as a parameter, but is also used as the instance name for that job call.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

:idea: Before processing a file, move it to an intermediate folder and work from there. This will avoid another instance / process reusing it.
Perwezakh
Premium Member
Premium Member
Posts: 38
Joined: Mon Jun 06, 2005 9:13 am
Location: Chicago, IL

Post by Perwezakh »

ArndW wrote:Execute "ls -f /my/path/to/directory". Use the output, which is comma separated, as a list to parse within a loop, extracting one file name per iteration. This filename is passed to a DataStage job as a parameter, but is also used as the instance name for that job call.
You are right ArndW, Thats how I should do it. Please let me know how should I pass the different different file names to DataStage parameter for each DataStage run. Because in the parameter I can define one environment variable and its one value.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

With "ls -f" you get a long string returned, each file name separated by commas. The BASIC function 'DCOUNT(string,",")' gives you the total number of fields in the string, then you iterate through the numbers and do 'FIELD(string,",",iteration) to extract the appropriate filename and then use that for the parameter and instance name.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A Sequence job would automate that for you. Use the UserVariables Activity stage to capture your delimited list of filenames and then the Start Loop stage can iterate through that list, passing one each loop to the processing job inside the loop.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply