Page 1 of 1

parse files and loop through each row

Posted: Fri Jul 11, 2003 6:19 pm
by bernicetsai
I am trying to decide if datastage is an efficient tool to parse the following text as opposed to doing it in Perl:

|sessionID=acnielsen3-716-1-834|.q211=1|.q354=0|.q175=0|.q176=0|.q124=0|

My file contains many of the above formatted rows. The number of pipes in one row is not fixed (dynamic). What I want to do is to loop through each key-value pair and dispatch them to different tables based on the keys (SessionID or .q).

If I use pipe as a delimiter in the sequential file, I will end up with more than 500 columns (have to take the max). In addition to that, each pipe has to be further divided by the "=". Also I can't find a way to loop through EACH key-value pair.

Does anyone have any idea how I can approach this problem?
Any suggestions are highly appreciated.

Posted: Fri Jul 11, 2003 6:38 pm
by kcbland
perl, no doubt about it. Unstructured or dynamic structure data is not well handled by DataStage's sequential stage. You're better off pre-parsing this file.

Good luck!

Kenneth Bland

Posted: Sat Jul 12, 2003 6:11 am
by tonystark622
If you replace all the pipe chars with @VM, you can write them to a hash file and read them back out "normalized". On the output link you have to define the column that has this data in it as multi-value (type = 'MV') and change the "Normalize on" field at the top of the tab to the field name of this field.

When the rows get read out of the hash file you'll see:

sessionID=acnielsen3-716-1-834
.q211=1
.q354=0
.q175=0
.q176=0
.q124=0
You can separate the name,value pairs by using the Field function and specifying "=" as the field separator.
Name column derivation:
Field(InputColName, "=", 1)

Value column derivation:
Field(InputColName, "=", 2)

Hope this helps,
Tony

Posted: Sat Jul 12, 2003 4:42 pm
by ray.wurlod
Replacing the pipe characters with @VM (a system variable) can be achieved with the Convert function. Process the whole line as a single column, apply Convert("|",@VM,InLink.ColName) as a transform function and write the result into a hashed file using sessionID as the key value.
Use "normalize on" in the Output link of the hashed file (or UV) stage.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518