Parsing a text file
Posted: Thu Nov 02, 2000 5:52 pm
I have a pipe"|" delimited flat text file with multiple records. Each record has at least 12 columns at the beginning of the record representing Customer info. After that, each record can have an indefinite number of columns, which when broken up into groups of 3,
represents "other info". Each record is terminated with a carriage
return/line feed.
My problem:
I need to parse out the first 12 columns of customer info and dump them to a new text file. I then need to get every 3 columns after the first 12, written out into another new text file along with Col1(Cust ID) . Remember, there can be anywhere from 15 to ? number of columns in each record.
Sample:
1IDA|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18
1IDB|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27
1IDC|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24
1IDD|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39
Expected Results:
Text File1
1IDA|2|3|4|5|6|7|8|9|10|11|12
1IDB|2|3|4|5|6|7|8|9|10|11|12
1IDC|2|3|4|5|6|7|8|9|10|11|12
1IDD|2|3|4|5|6|7|8|9|10|11|12
TextFile2
1IDA|13|14|15
1IDA|16|17|18
1IDB|13|14|15
1IDB|16|17|18
1IDB|19|20|21
1IDB|22|23|24
1IDB|25|26|27
1IDC|13|14|15
1IDC|16|17|18
1IDC|19|20|21
1IDC|22|23|24
1IDD|13|14|15
1IDD|16|17|18
1IDD|19|20|21
1IDD|22|23|24
1IDD|25|26|27
1IDD|28|29|30
1IDD|31|32|33
1IDD|34|35|36
1IDD|37|38|39
The way I was going to approach this is by reading the source file in a batch or routine utilizing the OpenSeq, ReadSeq commands.
I dont expect anybody to write this script for me, but I do have a few
questions:
-----Is it best for me to be using the OpenSeq, ReadSeq, WriteSeq commands?
----How is control maintained among the multiple FileVars being open at the same time?
Do I have to issue a new ReqdSeq command every time I want to read from or write to a particular FileVar?
----The way Ive designed the script so far, Im not sure if the code can/will ever recognize the end of line characters the way Im using embedded loops..
----Is there a basic command to recognize EOF or EOL
I.E. DO WHILE NOT EOF
ENDDO
----Is there a way in DataStage to trace/debug a batch or routine while its running?
What follows is a very preliminary(non-functioning) framework for using the OpenSeq, ReadSeq, WriteSeq commands.
PathName2 = SOURCEFILE
OpenSeq PathName2 To FileVar2 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
PathName3 = OUTTEXTFILE1
OpenSeq PathName3 To FileVar3 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
PathName4 = OUTTEXTFILE2
OpenSeq PathName4 To FileVar4 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
***********************************************
Irec = 0
Loop
pos1 = 0
col1 = 0
chars = 0
Irec =+1
ReadSeq Irec from FileVar2
On Error
If Status() = 1 Then
Ans = "ERROR"
Exit
End
End
Then
Loop
pos1 =+ 1
chars =+ 1
If Irec[pos1,1] = "|" Then
Col1 =+ 1
Goto NextCol
End
If Col1 = 1 Then
subpos = pos1
subpos =+ 4
CustID = Irec[pos1,4]
NextCol
End
If Col1 = 12 Then
CustInfo = Irec[1,chars]
GoSub WriteCust
GoSub ReadOther
Exit
End
NextCol:
Repeat
End Else
Exit
End
Repeat
CloseSeq FileVar2
GoTo ExitBatch
***********************GoSubs
WriteCust:
WriteSeq CustInfo to FileVar3
End
Return
ReadOther:
Loop
*****Read folling data 3 columns at a time and write to file
OtherInfo = Irec[col,col]
GoSub WriteOther
Repeat
Return
WriteOther:
WriteSeq CustId OtherInfo to FileVar4
End
Return
**************************
ExitBatch:
--
Michael Feckler mfeckler@onel.com
Minneapolis, MN
952-996-9145
represents "other info". Each record is terminated with a carriage
return/line feed.
My problem:
I need to parse out the first 12 columns of customer info and dump them to a new text file. I then need to get every 3 columns after the first 12, written out into another new text file along with Col1(Cust ID) . Remember, there can be anywhere from 15 to ? number of columns in each record.
Sample:
1IDA|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18
1IDB|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27
1IDC|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24
1IDD|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39
Expected Results:
Text File1
1IDA|2|3|4|5|6|7|8|9|10|11|12
1IDB|2|3|4|5|6|7|8|9|10|11|12
1IDC|2|3|4|5|6|7|8|9|10|11|12
1IDD|2|3|4|5|6|7|8|9|10|11|12
TextFile2
1IDA|13|14|15
1IDA|16|17|18
1IDB|13|14|15
1IDB|16|17|18
1IDB|19|20|21
1IDB|22|23|24
1IDB|25|26|27
1IDC|13|14|15
1IDC|16|17|18
1IDC|19|20|21
1IDC|22|23|24
1IDD|13|14|15
1IDD|16|17|18
1IDD|19|20|21
1IDD|22|23|24
1IDD|25|26|27
1IDD|28|29|30
1IDD|31|32|33
1IDD|34|35|36
1IDD|37|38|39
The way I was going to approach this is by reading the source file in a batch or routine utilizing the OpenSeq, ReadSeq commands.
I dont expect anybody to write this script for me, but I do have a few
questions:
-----Is it best for me to be using the OpenSeq, ReadSeq, WriteSeq commands?
----How is control maintained among the multiple FileVars being open at the same time?
Do I have to issue a new ReqdSeq command every time I want to read from or write to a particular FileVar?
----The way Ive designed the script so far, Im not sure if the code can/will ever recognize the end of line characters the way Im using embedded loops..
----Is there a basic command to recognize EOF or EOL
I.E. DO WHILE NOT EOF
ENDDO
----Is there a way in DataStage to trace/debug a batch or routine while its running?
What follows is a very preliminary(non-functioning) framework for using the OpenSeq, ReadSeq, WriteSeq commands.
PathName2 = SOURCEFILE
OpenSeq PathName2 To FileVar2 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
PathName3 = OUTTEXTFILE1
OpenSeq PathName3 To FileVar3 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
PathName4 = OUTTEXTFILE2
OpenSeq PathName4 To FileVar4 Locked
FilePresent = @True
End Then
FilePresent = @True
End Else
FilePresent = @False
End
***********************************************
Irec = 0
Loop
pos1 = 0
col1 = 0
chars = 0
Irec =+1
ReadSeq Irec from FileVar2
On Error
If Status() = 1 Then
Ans = "ERROR"
Exit
End
End
Then
Loop
pos1 =+ 1
chars =+ 1
If Irec[pos1,1] = "|" Then
Col1 =+ 1
Goto NextCol
End
If Col1 = 1 Then
subpos = pos1
subpos =+ 4
CustID = Irec[pos1,4]
NextCol
End
If Col1 = 12 Then
CustInfo = Irec[1,chars]
GoSub WriteCust
GoSub ReadOther
Exit
End
NextCol:
Repeat
End Else
Exit
End
Repeat
CloseSeq FileVar2
GoTo ExitBatch
***********************GoSubs
WriteCust:
WriteSeq CustInfo to FileVar3
End
Return
ReadOther:
Loop
*****Read folling data 3 columns at a time and write to file
OtherInfo = Irec[col,col]
GoSub WriteOther
Repeat
Return
WriteOther:
WriteSeq CustId OtherInfo to FileVar4
End
Return
**************************
ExitBatch:
--
Michael Feckler mfeckler@onel.com
Minneapolis, MN
952-996-9145