Reading a sequential file in Basic.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sandy_thomas
Participant
Posts: 9
Joined: Tue Dec 09, 2003 2:20 pm

Reading a sequential file in Basic.

Post by sandy_thomas »

Hi Every One,
Happy Holidays. You all have a good one.
Coming to the question.
I'm reading a sequential file in basic. I just want to know whether it is possible to use any string functions on the record we read like COMPARE or STR or SUBSTRINGS?

like

Code: Select all

OPENSEQ 'INPUTFILE' TO READFILE
ELSE ABORT
END
SELECT READFILE
READSEQ FIRSTRECORD FROM READFILE THEN
X=SUBSTRINGS(FIRSTRECORD,10,2)
Y=COMPARE(X,INPUTARG)
IF(Y)
DO SOMETHING
END
Thanx in advance,
San
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

This logic is very proper code for reading a sequential file into an array. FileName is the fully qualified path to a sequential file

Code: Select all

      RoutineName = "ReadSomething"
      Array = ""
      ReturnCode = 0
      Call DSLogInfo("Opening file: ":FileName, RoutineName)
      OPENSEQ FileName TO S.FileName Then
         Done = @FALSE
         Loop
            Readseq line from S.FileName Else Done = @TRUE
         Until Done Do
            Array<-1> = line
         Repeat
         CLOSESEQ S.FileName
      End Else
         Call DSLogWarn("Unable to open ":FileName, RoutineName)
         ReturnCode = 1
      End


Your file contents are in an array called Array and the pass/fail value is in ReturnCode. You should NOT use ABORT, as it will crash your job, which is undesirable.

Your answer for string functions is YES. Since a row is read from the file as a string, all of your methods for string work is available. You can do LEN, variable[x,y] substring shorthand notation, CHANGE, EREPLACE, CONVERT, etc.

Remember, IF-THEN-ELSE constructs are:

Code: Select all

IF expression Then DoThis Else DoThat

or

If expression Then
   Do These Lines
End Else
   Do Those lines
End

or

If expression Then
   Do These Lines
End Else Do That line
A programming tip is to first read the contents of a file into memory and make sure the contents are valid, then process those contents. I'm assuming you're not using this for data crunching, but as a means to read content from something like a parameter or configuration file.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you just want the first line, you may find it more convenient to use the UNIX head command. If you invoke this using DSExecute(), the result is in the third argument (as a string) and the exit status in the fourth.

Code: Select all

Shell = "UNIX"
Command = "head -1 " : FileName
Output = ""
ExitStatus = 0
Call DSExecute(Shell, Command, Output, ExitStatus)
If ExitStatus = 0
Then
    * statements to process Output
End
Else
    Msg = 'Cannot read from "' : FileName : '".  ExitStatus = ' : ExitStatus
    Call DSLogWarn(Msg, "MyRoutine")
End
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
In Addition to the above,
I had a need (once) to get some data from an input data file that would determine the destination partition to be populated (in a partitioned table).
so I built a general basic job that runs a parameter defined job and setting it to only fetch 1 row.
then I built a simple server job that reads from the data file and via user status returns the field I wanted from the row (the one that determins the partition).
then I could use that in my sequence job as a parameter to the actual job indicating the partition to be loaded to.
I really think basic is ok, I just wanted to do it another way :)
and see can it be done with little DS training and no real basic/os know how.
this might be clumsy but as simple as could be and will require as little DS know how as I can think of :)

Just an Idea !!!
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I had a discussion with Sandy about her need to read a sequential file in BASIC. Her needs are to split a source file into an indeterminate number of output files based on a column of data within each row. I'm going to share an after-job routine I wrote that splits a load file into multiple output files, based on the YearMonth of a business date within each row. Since target tables are usually partitioned along a business date, in month ranges, this routine has application to doing partitioned loading.

I'll summarize Oracle DBA design work for data warehouses here briefly. Large fact tables are partitioned into month ranges. ALL indexes are locally managed, meaning that no index spans data outside of its own partition. There is not a global primary key. The primary key is locally managed, and must include the partitioning column as part of the primary key definition.

That being said, when you DIRECT path load a table such as this from a single file, the database must determine the path of the row so that it will be placed into the appropriate partition. In addition, after all rows have been loaded, the database will then start to incrementally update the indexes within each partition in a rather lengthy manner, because it has to do all as a single exercise. If you were to have your data pre-sorted into a file per partition, you could run N background sqlldr DIRECT path runs, and since each load manages the work for the partition, you gain a high degree of parallelism and performance.

The routine I'm supplying has some tricks to learn from simple observation. It pretty much runs the gamut, from opening a sequential file, to reading a hash file into an array in memory, to dynamically creating output sequential files. My last client uses this on a large table that required partition loading. I setup the ETL control table for managing partitions and their loading mechanisms. This particular table is used to map months to partitions, and the partition could exist in a main or history table, as well as have other features.

Without further ado...

Code: Select all

      FunctionName = "FileSplitterByMonth"

*
* Dimension to hold the number of output file handles (1 per month)

      DIM OutputFileHandles(100)
*
* InputArg looks like:
*   WorkHashFileDirectory,/pathtofile/filename,{PARTITION/blank}, date column ordinal location,{DEBUG}
* 
*
      CommonHashFileDirectory = TRIM(FIELD(InputArg,",",1))
      InputFileName = TRIM(FIELD(InputArg,",",2))
      OutputFileBaseName = FIELD(InputFileName, "/", 1, DCOUNT(InputFileName,"/")-1)
      UsePartitionName = (TRIM(FIELD(InputArg,",",3)) = "PARTITION")
      DateColumnLocation = TRIM(FIELD(InputArg,",",4))
      DEBUGGING = (TRIM(FIELD(InputArg,",",5)) = "DEBUG")
      WorkFileName = InputFileName:".bak"
*
* Pull the table name from the filename
*
      TableToProcess = UPCASE(TRIM(FIELD(FIELD(InputFileName, ".", 1), "/", DCOUNT(InputFileName,"/"))))
*
* Look at the extension, it's either ".ins" or ".upd"
*
      LoadType = TRIM(FIELD(InputFileName, ".", 2))

*
* Make a backup copy of original file
*
      UNIXcmd = "cp ":InputFileName:" ":WorkFileName
      If DEBUGGING Then Call DSLogInfo("UNIX command: ":UNIXcmd, FunctionName)
      Call DSExecute("UNIX", UNIXcmd, ScreenOutput, SystemReturnCode)
      If DEBUGGING Then Call DSLogInfo("UNIX output: ":UNIXcmd:@AM:ScreenOutput, FunctionName)
      If SystemReturnCode > 0 Then
         Call DSLogFatal("The script [":UNIXcmd:"] had a SystemReturnCode of [":SystemReturnCode:"]", 

FunctionName)
      End

      Txt = ""
      Txt<-1> = "CommonHashFileDirectory: ":CommonHashFileDirectory
      Txt<-1> = "InputFileName: ":InputFileName
      Txt<-1> = "UsePartitionName: ":UsePartitionName
      Txt<-1> = "TableToProcess: ":TableToProcess
      Txt<-1> = "LoadType: ":LoadType
      Txt<-1> = "WorkFileName: ":WorkFileName
      If DEBUGGING Then Call DSLogInfo(Txt, FunctionName)

      OutputFileDates = ""
      OutputFileNames = ""
      OutputFileHandlePointers = ""
      OutputFileNamesUniq = ""
      OutputFileHandleCount = 0

*
* Read the staging hash file that contains the contents of warehouse table ETL_CTRL_PARTITIONS.
* This tables primary key is tablename and YYYYMM.  Every fact table is partitioned by business date month
* of the fact.  All inserts will be DIRECT path loaded into the named partitioned after separation.
* Every partition has the ability to be marked by DBA's as read-only in this table.  We will create
* a load file for each month, but add "-READONLY" to the filename so that it will not be grabbed by the
* auto-loader that sweeps all load files into partitions in parallel.
*
      OPENPATH CommonHashFileDirectory:"/stgETL_CTRL_PARTITIONS" TO F.ETL_CTRL_PARTITIONS Else
         Call DSLogFatal("Unable to open stgETL_CTRL_PARTITIONS", FunctionName)
      End

      OPENSEQ WorkFileName TO InputFileHandle Then
         If DEBUGGING Then Call DSLogInfo("Splitting file: ":InputFileName, FunctionName)

         SELECT F.ETL_CTRL_PARTITIONS
*
* Load the table into memory.  Build a series of arrays to hold months and file names.  Since a
* table can have a logical name "ORDER_DETAILS", but span a mainline and history table
* "ORDER_DETAILS_MAIN" and "ORDER_DETAILS_HIST", we include both a logical and physical name
* for the table that holds that month partition.  A month may be in the main table today, but next
* week is moved via transportable tablespace/partition exchanging into the history table.  All
* subsequent loads need to be directed there.
*
         cntr = 0
         Done = @FALSE
         Loop
            Readnext ETL_CTRL_PARTITIONSkey Else Done = @TRUE
         Until Done Do
            Read row From F.ETL_CTRL_PARTITIONS,ETL_CTRL_PARTITIONSkey Then
               YYYYMM = FIELD(ETL_CTRL_PARTITIONSkey, @TM, 1)
               LogicalTableName = FIELD(ETL_CTRL_PARTITIONSkey, @TM, 2)
               IF LogicalTableName = TableToProcess Then
                  PhysicalTableName = UPCASE(TRIM(row<1>))
                  PartitionName = UPCASE(TRIM(row<2>))
                  Readonly = row<3>
                  Begin Case
                     Case Readonly = "Y"
                        FileNameToUse = PartitionName:"-READONLY.":LoadType
                     Case UsePartitionName
                        FileNameToUse = PartitionName:".":LoadType
                     Case @TRUE
                        FileNameToUse = PhysicalTableName:".":LoadType
*
* If using DataStage to do loads, we need to make sure the files are there for the jobs
*
                        UNIXcmd = "touch ":OutputFileBaseName:"/":FileNameToUse
                        If DEBUGGING Then Call DSExecute("UNIX", UNIXcmd, ScreenOutput, SystemReturnCode)
                  End Case
                  cntr += 1
                  OutputFileDates<cntr> = YYYYMM
                  OutputFileNames<cntr> = FileNameToUse
               End
            End
         Repeat
*
* Parse the pipe-delimited data file.  Get the YYYY-MM-DD business date and find out what file it goes into.
* Find the file pointer and write to it the data.  If the YYYYMM doesn't have an existing file pointer,
* then go to the routine that creates and opens one.
*
         Done = @FALSE
         OutputFileHandleCount = 0
         Loop
         Until Done Do
            READSEQ line From InputFileHandle Then
               DateCheck = FIELD(line, "|", DateColumnLocation)
               YYYYMM = DateCheck[1,4]:"-":DateCheck[5,2]
               Locate YYYYMM IN OutputFileDates SETTING PTR Then
                  OutputFileHandlePtr = OutputFileHandlePointers<PTR>
                  If OutputFileHandlePtr = "" Then Gosub CreateOutputFileHandle
                  WRITESEQ line TO OutputFileHandles(OutputFileHandlePtr) Else
                     Call DSLogFatal("Could not update file", FunctionName)
                  End
               End
            End Else
               Done = @TRUE
            End
         Repeat
         For HandlePtr = 1 To OutputFileHandleCount
            WEOFSEQ OutputFileHandles(HandlePtr)
            CLOSESEQ OutputFileHandles(HandlePtr)
         Next HandlePtr
      End Else
         Call DSLogWarn("Unable to open file: ":WorkFileName, FunctionName)
      End

      ErrorCode = 0

      Goto Fini


CreateOutputFileHandle:

*
*  OutputFileNames<PTR> does not have a handle, so open the file else create it then open it
*
      NewFileName = OutputFileBaseName:"/":OutputFileNames<PTR>
      OPENSEQ NewFileName TO OutputFileHandle Then
         ValidOpen = @TRUE
      End Else
         If DEBUGGING Then Call DSLogInfo("Creating output file: ":NewFileName, FunctionName)
         CREATE OutputFileHandle Then ValidOpen = @TRUE
      End
      If NOT(ValidOpen) Then
         Call DSLogFatal("Could not create file: ":NewFileName, FunctionName)
      End
      If DEBUGGING Then Call DSLogInfo("Created output file: ":NewFileName, FunctionName)

      OutputFileHandleCount += 1
      OutputFileHandles(OutputFileHandleCount) = OutputFileHandle
      OutputFileHandlePtr = OutputFileHandleCount

      For i=1 to cntr
         If OutputFileNames<i> = OutputFileNames<PTR> Then OutputFileHandlePointers<i> = OutputFileHandlePtr
      Next cntr

      Return

Fini:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very nice, Ken. Thanks for coming back and giving us some closure on this - and for sharing from your library once again.
kcbland wrote:I'll summarize Oracle DBA design work for data warehouses here briefly. Large fact tables are partitioned into month ranges. ALL indexes are locally managed, meaning that no index spans data outside of its own partition. There is not a global primary key. The primary key is locally managed, and must include the partitioning column as part of the primary key definition...
Now I just need to find a real Oracle DBA. :wink: I work primarily with Sybase folks who were 'forced' to work with Oracle. Don't get me wrong - they are great people - it's just that they have little to no Oracle experience and their only DW experience was from the 'failed' Sysbase Warehouse we are working on replacing. The Modelers I work with say they are followers of Kimball, but without strong DBA support they are kind of left flapping in the breeze when it comes to trying to structure some of the nuances of a large DW, among them things like the partitioning and indexing methodolgogies you mention above. It's been... interesting... to say the least, but as just the "ETL guy" I'm somewhat further down the food chain than I'm used to.

Sorry, didn't mean to turn this into a little diatribe. We'll see how things will change as they've just outsourced their DBA work to another company and are transitioning out (according to rumors) all but one of our folks. Nice Christmas present, eh? Our little group hasn't had any exposure to this other company so we have no idea what to expect going forward. Things will probably go from 'interesting' to 'even more interesting'. :roll: Time will tell.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

chulett wrote:Things will probably go from 'interesting' to 'even more interesting'. :roll: Time will tell.
But they still pay you well, I hope? :wink: :mrgreen: :mrgreen: :mrgreen:
Developer of DataStage Parallel Engine (Orchestrate).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

When they pay me. :? Yet another 'funny' story - they recently replaced their old, 'outdated' mostly manual invoice processing system with a new spanky web-based one. There is a lovely Portal with an automagic process behind the scenes to make all of our lives easier and hooks to Brio reports so we can track our progress thru the system. Problem is - now it takes longer to get from Invoice to Check than it ever did the old way and it seems to have... a 'problem' or two. :roll: Oh, and when you go to track your invoices, you can put in any invoice number you can think of and see it in glorious Technicolor - even if it ain't yours! Looking for your invoice #12? Find every invoice #12 ever submitted by anyone! What could be better! Hah! I laugh in your general direction!

Where'd you get the greenies?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

chulett wrote:Where'd you get the greenies?
Click on "View more Emoticons."

:|

Sad story.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Teej wrote:Click on "View more Emoticons."
More as in two more? Ok... thanks for Mr Green. :mrgreen:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Penalties

Post by ray.wurlod »

chulett wrote:When they pay me. :? Yet another 'funny' story - they recently replaced their old, 'outdated' mostly manual invoice processing system with a new spanky web-based one. There is a lovely Portal with an automagic process behind the scenes to make all of our lives easier and hooks to Brio reports so we can track our progress thru the system. Problem is - now it takes longer to get from Invoice to Check than it ever did the old way and it seems to have... a 'problem' or two. :roll: Oh, and when you go to track your invoices, you can put in any invoice number you can think of and see it in glorious Technicolor - even if it ain't yours! Looking for your invoice #12? Find every invoice #12 ever submitted by anyone! What could be better! Hah! I laugh in your general direction!

Where'd you get the greenies?
I used to feel slightly guilty about adding a penalty clause to my invoices (interest on late payments based on the rates I have to pay to the Australian Tax Office - related to the ten year bond rate).
Since then I've seen the penalties a certain large RGB(0,0,255) company imposes - interest higher than my rate, and PER MONTH!!!
Don't feel guilty any more. 8)
Don't worry too much about late payments, either - they end up paying more.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply