DS Basic to handle file

Kirtikumar · Post by **Kirtikumar** » Thu Oct 20, 2005 10:51 pm

Hi,

I am working on a DS basic routine whose functionality should be as follows:

It should read one line at a time from input file, compare it with a string. If it matches, delete this line.
This matching process should continued till last line.
I have used REASSEQ and WRITESEQ implement this. Whenever a match is found, I am moving the pointer back to prev line which matched and inserting spaces for the whole line.
This works fine, but actual file consist of this blank line(s) after the whole function is complete.

Is there any way to physically delete the line sothat there will be no spaces as it is the current case.

loveojha2 · Post by **loveojha2** » Fri Oct 21, 2005 12:52 am

Code: Select all

$IFNDEF JOBCONTROL.H 
	$INCLUDE DSINCLUDE JOBCONTROL.H 
$ENDIF
print "hdsjk"
OpenSeq "c:\filename.txt" To FileVar Else Call DSLogWarn("This Program","Message")
AbcLen=0
Addit=0
FileLine1=''
Loop 
	ReadSeq FileLine From FileVar ELSE Ans='1'
	if FileLine1=FileLine Then
		Exit
	END
	if FileLine#"abc"
	then 	
		AbcLen=-1*Addit+(-1*Len(FileLine))-2
		SEEK FileVar,AbcLen,1 THEN 
			WriteSeq FileLine ON FileVar Else Ans='0'
			FileLine1=FileLine
			AbcLen=0
		END
	END
	Else	
		Addit=Addit+Len("abc")+2
	END
Repeat
CloseSeq FileVar
Ans='1'

hope this will help you, i have not done it with the proper exception handling.

Enjoy

ArndW · Post by **ArndW** » Fri Oct 21, 2005 1:55 am

Kirtikumar,

when changing a sequential file, it is better to create a copy than to edit the original. After the new file is written you can delete the original and rename the new file if you wish.

Here is some code that will do what you want, add error handling to your taste and since I just typed it in it might not compile the first go around:

Code: Select all

StringToDelete = 'This is a bad string'
OPENSEQ '/tmp/SourceFile.txt' TO InFilePtr THEN NULL ELSE CALL DSLogFatal('No input file','')
OPENSEQ '/tmp/TargetFile.txt' TO OutFilePtr THEN WEOFSEQ OutFilePtr ELSE NULL
Finished = 0
READSEQ InRecord FROM InFilePtr ELSE Finished = 1
LOOP UNTIL Finished
   IF InRecord <> StringToDelete THEN WRITESEQ InRecord ON OutFilePtr ELSE CALL DSLogFatal('Unable to write','')
   READSEQ InRecord FROM InFilePtr ELSE Finished = 1
REPEAT
CLOSESEQ InFilePtr
CLOSESEQ OutFilePtr

Kirtikumar · Post by **Kirtikumar** » Sun Oct 23, 2005 10:40 pm

ArndW,

I agree with what you are saying. The problem with this approach is as follows:
A batch process is accessing this file to append data processed. At the same time, another DS job should access it and delete the matching line.
If I follow the mentioned approach, in between before renaming, the first batch process may try to append/create it before the temp is renamed with original file name.

Sorry for not providing sufficient info before posting the need.
Thanks for your inputs. I am thinking of using the method used for deleting an element from array in C. Say to delete 3rd item, shift all the items from 4th till last, one position up the order.

Lets see if it works or not.

kcbland · Post by **kcbland** » Sun Oct 23, 2005 10:44 pm

Kirtikumar wrote:A batch process is accessing this file to append data processed. At the same time, another DS job should access it and delete the matching line.

This will never work. Files used buffered i/o when writing. There's no guarantee that a complete line will be in the file, because there's no concept of lines, there's only blocks. If a process is flushing after every write, than you've got something to work with here.

However, this is not your case. Completely forget about the idea of two independent processes reading, writing, and deleting from the same file simultaneously. If you need to work with a database, than either put this file into a database or put it into a hash file where you have row level locking and transactional capabilities.

kcbland · Post by **kcbland** » Sun Oct 23, 2005 10:45 pm

Kirtikumar wrote:Lets see if it works or not.

Not. I'll bet a box of donuts on it. I like Dunkin Donuts chocolate. Mmmm.. chocolate.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 24, 2005 12:58 am

WRITESEQF guarantees an immediate flush. There's also a FLUSH statement, and you can use the NOBUF statement to make I/O to this particular file non-buffered.

You can certainly treat the file as a dynamic array but, to do this, you must open its parent directory with OPENPATH then read the entire file with a single READU statement.
The first line in the file is the first element in the array, the second line in the file is the second element in the array, and so on. DataStage is one-based, unlike C which is zero-based.
Pull each line from the file using the REMOVE statement. This keeps track of where it left off.
Build a second dynamic array containing only the lines (elements) that you want to keep, then WRITE this new record back with the same file name with which the original file was read.
Close the directory with a CLOSE statement.

For files that are not too large (compared to available memory) this is quite an efficient technique.

kcbland · Post by **kcbland** » Mon Oct 24, 2005 7:06 am

Ray, Kim, and Arnd are probably the only 3 people on this forum who would know how to write a purely DS BASIC solution that would work. I doubt anyone else (and I include myself) could successfully do what is being requested.

We don't know what the "batch process" is that is writing to this sequential file, I was assuming it was a DS job, which means you have no ability to alter its method for writing to sequential files. Furthermore, if it's a differrent application, you're going to have to make it conform to the rules Ray has laid out.

You know, switching to a hash file would solve a significant number of your issues. Row level locking, DELETE statements, etc. Why engineer a monstrosity when there's something so easy?

chulett · Post by **chulett** » Mon Oct 24, 2005 7:16 am

kcbland wrote:Ray, Kim, and Arnd are probably the only 3 people on this forum who would know how to write a purely DS BASIC solution that would work.

Ok.

(pretty sure I could give it a run for its money)

kcbland · Post by **kcbland** » Mon Oct 24, 2005 8:11 am

Sorry Craig, but even I wouldn't try to do this. I know the three amigos go waaaay back to Prime/ARev/MickeyD box days, and would know how to do this. Everyone else would probably be trial and error. I'm sure you'd give it a good run.

ArndW · Post by **ArndW** » Mon Oct 24, 2005 8:21 am

My thoughts are - why bother attempting to code a DataBase type solution for a flat file when you can have (a) UniVerse or (b) your ETL database do it for you?

If you really need to have several process use (and modify) a sequential file then you should do you own file-level locking. Use a second (empty file) whose existence or not shows whether the real file is locked. You'll still have a couple of milliseconds of potential error doing that. And if you use the DataStage semaphores to avoid that you might as well just use Hashed files

I don't know if the solution will work with a pure UNIX sequential file. I don't have the c documentation handy, but I think that when a file is OPENed it get a pointer to the start of the file and a pointer to the EOF. If someone else changes this file after it has been opened and moves the EOF back (i.e. adds data), the first program has no idea that this has happened. And the last process to close a file unit truncates the file after it's EOF position, so one would have to code for this as well.

DataStage has no problem with huge strings, so do as was suggested already - read the whole file in as one long string and manipulate it in memory, then write it back.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 24, 2005 3:43 pm

The DataStage job design (no coding required) has the form

Code: Select all

SeqFile ----->  Transformer  -----> SeqFile

followed by after-stage subroutine to rename/remove text files.

A constraint expression in the Transformer stage guarantees that only required rows are written into the target file.