parse files and loop through each row

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bernicetsai
Participant
Posts: 7
Joined: Fri Jul 11, 2003 5:04 pm

parse files and loop through each row

Post by bernicetsai »

I am trying to decide if datastage is an efficient tool to parse the following text as opposed to doing it in Perl:

|sessionID=acnielsen3-716-1-834|.q211=1|.q354=0|.q175=0|.q176=0|.q124=0|

My file contains many of the above formatted rows. The number of pipes in one row is not fixed (dynamic). What I want to do is to loop through each key-value pair and dispatch them to different tables based on the keys (SessionID or .q).

If I use pipe as a delimiter in the sequential file, I will end up with more than 500 columns (have to take the max). In addition to that, each pipe has to be further divided by the "=". Also I can't find a way to loop through EACH key-value pair.

Does anyone have any idea how I can approach this problem?
Any suggestions are highly appreciated.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

perl, no doubt about it. Unstructured or dynamic structure data is not well handled by DataStage's sequential stage. You're better off pre-parsing this file.

Good luck!

Kenneth Bland
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

If you replace all the pipe chars with @VM, you can write them to a hash file and read them back out "normalized". On the output link you have to define the column that has this data in it as multi-value (type = 'MV') and change the "Normalize on" field at the top of the tab to the field name of this field.

When the rows get read out of the hash file you'll see:

sessionID=acnielsen3-716-1-834
.q211=1
.q354=0
.q175=0
.q176=0
.q124=0
You can separate the name,value pairs by using the Field function and specifying "=" as the field separator.
Name column derivation:
Field(InputColName, "=", 1)

Value column derivation:
Field(InputColName, "=", 2)

Hope this helps,
Tony
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Replacing the pipe characters with @VM (a system variable) can be achieved with the Convert function. Process the whole line as a single column, apply Convert("|",@VM,InLink.ColName) as a transform function and write the result into a hashed file using sessionID as the key value.
Use "normalize on" in the Output link of the hashed file (or UV) stage.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply