dont create file when no data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

dont create file when no data

Post by har »

hi,
I have 2 sequential outputs from a transformer. one is output file and another one is an error file.
when i am running the job,at times i dont have any error data,but the sequential stage nonetheless creates an empty error file. I would want to write an error file only if there is data coming in and not create one if there are zero rows coming in. How do I do that?
Thanks in advance.
kris
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You can't. That's the way DS works. One option you may consider is to write a file named "fred" and then use an after-job routine to rename it "barney" if it's not empty.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The simple fact of the Sequential stage starting and stopping creates the file as it is opened and then closed, so as Ken said it can't be avoided.

People that are bothered by the empty files usually write an After Job script to test and remove them if they are zero bytes. That's a pretty simple thing on UNIX but I'm not really sure how, in a batch file, you can test for an empty file. :? Or you can take Ken's Flintstone Approach.

Either that or don't worry about them - we've got the same situation and we don't. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

A Platform Independent Solution

Post by 1stpoint »

This is an important problem that we solved using Python. You will need to install the python interpreter from www.python.org. This script is callable from within a DataStage Batch.

Code: Select all

"""purgezero.py
purge zero-byte files
usage:  purgezero.py Directory
"""
import sys, os
from os import listdir
from os.path import isdir, isfile, getsize

for char in sys.argv[1:2]: dirs=char
d=dirs
filesremoved=0
if isdir(d):
    contents = listdir(d)
    for path in contents:
        path = join(d, path)
        if isfile(path) and getsizde(path) == 0:
            os.remove(path)
            print "purged file:",path
            filesremoved+=1
print str(filesremoved)," files purged."
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

Revised: purgezero.py

Post by 1stpoint »

The above code has a spelling error and a missing include, the code below is both fixed and tested:

Code: Select all

"""purgezero.py
purge zero-byte files
usage:  purgezero.py Directory
"""
import sys, os
from os import listdir
from os.path import isdir, isfile, getsize, join

for char in sys.argv[1:2]: dirs=char
d=dirs
filesremoved=0
if isdir(d):
    contents = listdir(d)
    for path in contents:
        path = join(d, path)
        if isfile(path) and getsize(path) == 0:
            os.remove(path)
            print "purged file:",path
            filesremoved+=1
print str(filesremoved)," files purged."
rgattu
Participant
Posts: 4
Joined: Wed Jun 02, 2004 8:38 am

Re: dont create file when no data

Post by rgattu »

Actually you can easily solve this problem with following steps:

1. write a unix script which checks for a file size in bytes
2. Delete it if size is greater than zero
3. Excute this script using after stage subroutine and send filename as argument to subroutine
4. Use ExecSH command to execute script and supply filename as argument
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

what if they are on windows??

Post by 1stpoint »

A unix shell script is certainly viable but not platform independent.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: dont create file when no data

Post by chulett »

rgattu wrote:Actually you can easily solve this problem with following steps:

1. write a unix script which checks for a file size in bytes
2. Delete it if size is greater than zero
3. Excute this script using after stage subroutine and send filename as argument to subroutine
4. Use ExecSH command to execute script and supply filename as argument
Except for the fact that the original poster is running on a Windows server, hence my comments about doing it in a 'batch' file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

been there done that

Post by 1stpoint »

Except for the fact that the original poster is running on a Windows server, hence my comments about doing it in a 'batch' file.
MS-DOS doesn't by default have this capability. There are some "freeware" utilities that will to it (i can remember using test -s).

The best and most robust solution is a platform independent script (perl/python/tcl). I prefer python because it's the most powerful and easy to read and maintain, plus I can migrate it to *NIX and it runs the same way.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Do it as a routine. The status command will tell you filesize even of a hash file. Ray posted a real nice routine on ADN which tells you if a sequential file is readable or writable. Find this routine and add filesize to it.
Mamu Kim
s_boyapati
Premium Member
Premium Member
Posts: 70
Joined: Thu Aug 14, 2003 6:24 am
Contact:

Post by s_boyapati »

Rather than as After-Job routine, better to run it as after stage routine. It will avoid further problem like "max number of files can be opened simultaneously (platform dependent)" on busy servers in nightly processing/multi user job submissions in business hours.
Sree Boyapati
Sr. ETL Architect
Certified Developer in DataStage, QualityStage, Information Analyzer.
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Post by har »

hi guys,

Thanx for u r replies,it was helpful.
Har
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

This has already been discussed. Pls follow the following link

viewtopic.php?t=82827

From the following link you will find that if the OS is windows it is good to do any of the following

1. Use the DSJ.LINKROWCOUNT infotype for the DSGetLinkInfo function in an after-job routine or the overall controlling batch. (Courtesy-None other than Ray)

2. We have overcome this by selecting only "full" files for subsequent processing..(Courtesy-Kasia)

I had this in my favourites but what does kasia mean by selecting only "full" files for subsequent processing??

Thanks in advance
Rich


A little bit of ink is powerful than the strongest memory
--Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wow - something from the Oliver archives. :wink: I wasn't sure people actually searched there.

By "full", I'm assuming they mean they only want to process (non-empty) files with records "in subsequent jobs" and have built a methodology to detect / avoid / not process the empty ones they have created.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply