Find Last Record in File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Find Last Record in File

Post by pnchowdary »

Hi Guys,

I have a sequential file, that can contain variable number of rows. While processing the input data in the transformer. Is there any way, by which I can tell that the row I am currently processing is the last row in the input source file or not ?. Any help would be appreciated.

Thanks
Naveen
ds_developer
Premium Member
Premium Member
Posts: 224
Joined: Tue Sep 24, 2002 7:32 am
Location: Denver, CO USA

Post by ds_developer »

If it is a fixed-width file, you could use a stage variable to call a routine (once) that goes to the OS and parse out the size of the file. Divide that by the length of each record then use @INROWNUM to determine when you are at the last record.

Sorry I don't have an example to give you, just an idea. Otherwise, there isn't a built-in way of knowing you are on the last row.

John
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Naveen,

since the sequential file has no pointer forward, you can never know if your next READ is going to reach the end-of-file. Depending upon what you want to do at the EOF you have several approaches. As John mentioned, only a fixed-length record file will let you know the number of lines - and even there you would have to position to the end of the file to get that number - in other words you will need to make at least one pass of all the data.

One approach would be to use the result of a UNIX wc -l command that counts the number of lines in a sequence and then pass that result to your job as a parameter, then you could perform a check like "IF @INROWNUM=#MyInputNumberOfLines# THEN ..."
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

Alternate,

step1: Write a routine, which would use DSEXECUTE to fire wc -l command on UNIX box and get the count of lines. pass the file name and path as argument to the file.

step2:Call that routine in a stage variable in one of your transformer and get control over the records. :D
Shantanu Choudhary
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Shantanu,

the basic idea is good, but a stage variable will get executed for each row passed through the transformer... This will probably slow down your job considerably. Better a combination - use the routine, but call it from a parent Sequencer.
ds_developer
Premium Member
Premium Member
Posts: 224
Joined: Tue Sep 24, 2002 7:32 am
Location: Denver, CO USA

Post by ds_developer »

A techique I use to call a routine in a stage variable only once is:

1. set the default to @NULL when defining the stage variable
2. use 'IF IsNull(stage_variable_name) THEN call routine ELSE stage_variable_name'
3. of course, you need to make sure the routine doesn't return NULL or it will be called again...

John
Last edited by ds_developer on Fri Jul 01, 2005 11:32 am, edited 1 time in total.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

John,

that's a good approach, I usually use a COMMON in the routine to skip subsequent calls; and the overhead to PCL a function or subroutine has much more overhead than an IF-THEN construct. But in either case we are adding unneeded extraneous code for each row, so for efficiency it does make sense to put this part outside of the loop. One still needs an IF-THEN each row to test whether or not we are at the last record, but that is unavoidable under the circumstances.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Only call the routine in the Initial Value of the Stage Variable and don't put anything in the derivation field of the Transformer entry. Only called once and then can be referenced as much as you like. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

D**m, he's good!!! And I always wondered why engineering bothered to put the initial value in!!!

Thanks,
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Guys,

Thanks for all your inputs and ideas.

Naveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's always useful to be able to initialize counters! :D
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

ArndW wrote:Shantanu,

the basic idea is good, but a stage variable will get executed for each row passed through the transformer... This will probably slow down your job considerably. Better a combination - use the routine, but call it from a parent Sequencer.
Thanx for highlighting, sorry i missed out writing that part. For that I would have used COMMON or the way ds_developer has suggested or having StageVar=If @INROWNUM =1 then call routine Else StageVar. Third approach is almost similar to ds_devloper approach.

Chulett your approach is gr8 man, I never thought of calling a routine in Initial Value.
Shantanu Choudhary
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Once it clicks that you can do it, you'll never go back. :wink: It becomes useful in all kinds of situations. Equivalent to the "If @INROWNUM = 1" approach without the extra overhead of the check and assignment on each row.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea:
Even better, the technique generalizes to parallel jobs. Using variables in COMMON does not.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply