DS Basic to handle file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

DS Basic to handle file

Post by Kirtikumar »

Hi,

I am working on a DS basic routine whose functionality should be as follows:

It should read one line at a time from input file, compare it with a string. If it matches, delete this line.
This matching process should continued till last line.
I have used REASSEQ and WRITESEQ implement this. Whenever a match is found, I am moving the pointer back to prev line which matched and inserting spaces for the whole line.
This works fine, but actual file consist of this blank line(s) after the whole function is complete.

Is there any way to physically delete the line sothat there will be no spaces as it is the current case.
Regards,
S. Kirtikumar.
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

Code: Select all

$IFNDEF JOBCONTROL.H 
	$INCLUDE DSINCLUDE JOBCONTROL.H 
$ENDIF
print "hdsjk"
OpenSeq "c:\filename.txt" To FileVar Else Call DSLogWarn("This Program","Message")
AbcLen=0
Addit=0
FileLine1=''
Loop 
	ReadSeq FileLine From FileVar ELSE Ans='1'
	if FileLine1=FileLine Then
		Exit
	END
	if FileLine#"abc"
	then 	
		AbcLen=-1*Addit+(-1*Len(FileLine))-2
		SEEK FileVar,AbcLen,1 THEN 
			WriteSeq FileLine ON FileVar Else Ans='0'
			FileLine1=FileLine
			AbcLen=0
		END
	END
	Else	
		Addit=Addit+Len("abc")+2
	END
Repeat
CloseSeq FileVar
Ans='1'
hope this will help you, i have not done it with the proper exception handling. :oops:
Enjoy
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Kirtikumar,

when changing a sequential file, it is better to create a copy than to edit the original. After the new file is written you can delete the original and rename the new file if you wish.

Here is some code that will do what you want, add error handling to your taste and since I just typed it in it might not compile the first go around:

Code: Select all

StringToDelete = 'This is a bad string'
OPENSEQ '/tmp/SourceFile.txt' TO InFilePtr THEN NULL ELSE CALL DSLogFatal('No input file','')
OPENSEQ '/tmp/TargetFile.txt' TO OutFilePtr THEN WEOFSEQ OutFilePtr ELSE NULL
Finished = 0
READSEQ InRecord FROM InFilePtr ELSE Finished = 1
LOOP UNTIL Finished
   IF InRecord <> StringToDelete THEN WRITESEQ InRecord ON OutFilePtr ELSE CALL DSLogFatal('Unable to write','')
   READSEQ InRecord FROM InFilePtr ELSE Finished = 1
REPEAT
CLOSESEQ InFilePtr
CLOSESEQ OutFilePtr
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

ArndW,

I agree with what you are saying. The problem with this approach is as follows:
A batch process is accessing this file to append data processed. At the same time, another DS job should access it and delete the matching line.
If I follow the mentioned approach, in between before renaming, the first batch process may try to append/create it before the temp is renamed with original file name.

Sorry for not providing sufficient info before posting the need.
Thanks for your inputs. I am thinking of using the method used for deleting an element from array in C. Say to delete 3rd item, shift all the items from 4th till last, one position up the order.

Lets see if it works or not.
Regards,
S. Kirtikumar.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Kirtikumar wrote:A batch process is accessing this file to append data processed. At the same time, another DS job should access it and delete the matching line.
This will never work. Files used buffered i/o when writing. There's no guarantee that a complete line will be in the file, because there's no concept of lines, there's only blocks. If a process is flushing after every write, than you've got something to work with here.

However, this is not your case. Completely forget about the idea of two independent processes reading, writing, and deleting from the same file simultaneously. If you need to work with a database, than either put this file into a database or put it into a hash file where you have row level locking and transactional capabilities.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Kirtikumar wrote:Lets see if it works or not.
Not. I'll bet a box of donuts on it. I like Dunkin Donuts chocolate. Mmmm.. chocolate.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

WRITESEQF guarantees an immediate flush. There's also a FLUSH statement, and you can use the NOBUF statement to make I/O to this particular file non-buffered.

You can certainly treat the file as a dynamic array but, to do this, you must open its parent directory with OPENPATH then read the entire file with a single READU statement.
The first line in the file is the first element in the array, the second line in the file is the second element in the array, and so on. DataStage is one-based, unlike C which is zero-based.
Pull each line from the file using the REMOVE statement. This keeps track of where it left off.
Build a second dynamic array containing only the lines (elements) that you want to keep, then WRITE this new record back with the same file name with which the original file was read.
Close the directory with a CLOSE statement.


For files that are not too large (compared to available memory) this is quite an efficient technique.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Ray, Kim, and Arnd are probably the only 3 people on this forum who would know how to write a purely DS BASIC solution that would work. I doubt anyone else (and I include myself) could successfully do what is being requested.

We don't know what the "batch process" is that is writing to this sequential file, I was assuming it was a DS job, which means you have no ability to alter its method for writing to sequential files. Furthermore, if it's a differrent application, you're going to have to make it conform to the rules Ray has laid out.

You know, switching to a hash file would solve a significant number of your issues. Row level locking, DELETE statements, etc. Why engineer a monstrosity when there's something so easy?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kcbland wrote:Ray, Kim, and Arnd are probably the only 3 people on this forum who would know how to write a purely DS BASIC solution that would work.
Ok. :cry:

(pretty sure I could give it a run for its money)
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sorry Craig, but even I wouldn't try to do this. I know the three amigos go waaaay back to Prime/ARev/MickeyD box days, and would know how to do this. Everyone else would probably be trial and error. I'm sure you'd give it a good run.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

My thoughts are - why bother attempting to code a DataBase type solution for a flat file when you can have (a) UniVerse or (b) your ETL database do it for you?

If you really need to have several process use (and modify) a sequential file then you should do you own file-level locking. Use a second (empty file) whose existence or not shows whether the real file is locked. You'll still have a couple of milliseconds of potential error doing that. And if you use the DataStage semaphores to avoid that you might as well just use Hashed files :)

I don't know if the solution will work with a pure UNIX sequential file. I don't have the c documentation handy, but I think that when a file is OPENed it get a pointer to the start of the file and a pointer to the EOF. If someone else changes this file after it has been opened and moves the EOF back (i.e. adds data), the first program has no idea that this has happened. And the last process to close a file unit truncates the file after it's EOF position, so one would have to code for this as well.

DataStage has no problem with huge strings, so do as was suggested already - read the whole file in as one long string and manipulate it in memory, then write it back.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The DataStage job design (no coding required) has the form

Code: Select all

SeqFile ----->  Transformer  -----> SeqFile
followed by after-stage subroutine to rename/remove text files.

A constraint expression in the Transformer stage guarantees that only required rows are written into the target file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply