reading variable length data
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
reading variable length data
Hi,
I have a sequential file, the file has varaible length data, with each line
which does not have the number of columns fixed.
for example
1|12345|abcd|abef|1a
2|3456|
3|123456|abcde|abef|1a|12457
I would like to use PX and convert this information as
follows
1,12345
1,abcd
1,abef
1,1a
2,3456
3,123456
3,abcde
3,abef
3,1a
3,12457
thankz in advance
regards,
varsha
I have a sequential file, the file has varaible length data, with each line
which does not have the number of columns fixed.
for example
1|12345|abcd|abef|1a
2|3456|
3|123456|abcde|abef|1a|12457
I would like to use PX and convert this information as
follows
1,12345
1,abcd
1,abef
1,1a
2,3456
3,123456
3,abcde
3,abef
3,1a
3,12457
thankz in advance
regards,
varsha
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
One way to do this would be:
1. Read each row as a single column.
2. Process each row through a custom routine which
a. use count() to count the '|' to determine the number of fields in the row --vN
b. use field() to grab first columns value -- vFirstValue
c. use field() in a loop 2 to N to gather subsequent field values -- vNextValue
d. use writeseq() to write vFirstValue:",":vNextValue:char(10) to a Sequential File for each loop.
I can think of at least one other wayof doing this by using convert() to change '|' to @TM then parsing...(I like MV fields)
Im sure there are many other ways as well to accomplish this but the one illustrated above is quite simple I think.
"Coding is FUNdamental"
1. Read each row as a single column.
2. Process each row through a custom routine which
a. use count() to count the '|' to determine the number of fields in the row --vN
b. use field() to grab first columns value -- vFirstValue
c. use field() in a loop 2 to N to gather subsequent field values -- vNextValue
d. use writeseq() to write vFirstValue:",":vNextValue:char(10) to a Sequential File for each loop.
I can think of at least one other wayof doing this by using convert() to change '|' to @TM then parsing...(I like MV fields)
Im sure there are many other ways as well to accomplish this but the one illustrated above is quite simple I think.
"Coding is FUNdamental"
Lance Short
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
"infinite diversity in infinite combinations"
***
"The absence of evidence is not evidence of absence."
Varsha,
Lance is correct. EE expects input rows to be of an identical format-- the same number of columns for each row. Since your input rows have different numbers of columns (assuming you use the '|' as a delimiter), you'll need to use a different delimiter ('',' perhaps) to treat each row as if it had the same number of columns, in this case, just one.
Furthermore, you wish to output more rows than are in the input dataset. There aren't really any Parallel stages that provide this functionality exactly as you have presented it. Therefore, you should write a custom routine as Lance suggests. The char(10) is the line feed character and will effectively create additional output rows for you.
You should probably create a Server job to do this manipulation, as the custom code that Lance has presented is much more readily developed using the traditional Server platform rather than the newer Parallel framework.
Alan
Lance is correct. EE expects input rows to be of an identical format-- the same number of columns for each row. Since your input rows have different numbers of columns (assuming you use the '|' as a delimiter), you'll need to use a different delimiter ('',' perhaps) to treat each row as if it had the same number of columns, in this case, just one.
Furthermore, you wish to output more rows than are in the input dataset. There aren't really any Parallel stages that provide this functionality exactly as you have presented it. Therefore, you should write a custom routine as Lance suggests. The char(10) is the line feed character and will effectively create additional output rows for you.
You should probably create a Server job to do this manipulation, as the custom code that Lance has presented is much more readily developed using the traditional Server platform rather than the newer Parallel framework.
Alan
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
All you want and more can be found in the Parallel Job Developer's Guide manual.
The section on data types in Chapter 2, and the chapters on Split Vector and Make Vector stages will be a good starting point.
The section on data types in Chapter 2, and the chapters on Split Vector and Make Vector stages will be a good starting point.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hence my opening "If you can figure out some way of loading everything after field 1 into a variable-length vector". I won't be able to devote any time to it; my current gig is server-only (so I don't have PX to play with).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Lance, a vector is a close analogy to a multi-valued field. A vector of subrecords is a close analogy to an associated set of multi-valued fields. Does that help any?
For COBOL folks, a fixed length vector corresponds roughly to an OCCURS clause, and a variable length vector corresponds roughly to an OCCURS DEPENDING ON clause.
For COBOL folks, a fixed length vector corresponds roughly to an OCCURS clause, and a variable length vector corresponds roughly to an OCCURS DEPENDING ON clause.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I have tried Lance 's suggestion before, it works well. However, the performance is not good while the incoming data become larger. But anyway, it is one of the solution.
lshort wrote:One way to do this would be:
1. Read each row as a single column.
2. Process each row through a custom routine which
a. use count() to count the '|' to determine the number of fields in the row --vN
b. use field() to grab first columns value -- vFirstValue
c. use field() in a loop 2 to N to gather subsequent field values -- vNextValue
d. use writeseq() to write vFirstValue:",":vNextValue:char(10) to a Sequential File for each loop.
I can think of at least one other wayof doing this by using convert() to change '|' to @TM then parsing...(I like MV fields)
Im sure there are many other ways as well to accomplish this but the one illustrated above is quite simple I think.
"Coding is FUNdamental"