Page 1 of 1

Splitting a record into variable number of records

Posted: Wed Dec 31, 2003 2:30 am
by viji
I need to split a record into 'n' records, but this 'n' varies for every record in my input flat file.
Can you please help me to achieve this in datastage (I understand that I can use pivot stage to split records, but only if 'n' is known & same for all records).

Is writing a user defined routine for a better option for this?

Posted: Wed Dec 31, 2003 3:50 am
by roy
Hi,
basically if your n variable is totally dynamic and so is the final number of row types, a routine sounds the best way.
be aware that nls issues, if you use nls, might occur using routines so you might want to prepare your file to the right nls prior to your routine, you can achieve this by using a format of varchar(max_row_len) and convert the entire row if possible to the right nls, or anything similar that will get you to the right nls you want.

there is also the CFF stage but I don't know if you can use it in your case (check it out).

IHTH

Posted: Wed Dec 31, 2003 4:00 pm
by ray.wurlod
The Pivot stage can handle a variable N, but you must provide definitions for the maximum possible N. Any not represented will be either NULL or not assigned (can't recall which), which you can test for using IsNull() or Unassigned() functions.
Search the forum for a technique that involves a design like:

Code: Select all

  stream  ---->  SeqFile stage ---->  stream
What you write into the sequential file is a single column containing the line of source with embedded "end of line" characters (Char(10) on UNIX). When you read from the same sequential file, it is magically transmogrified into multiple lines based on these characters! And it's FAST.

(edited after Roy's post)

Posted: Thu Jan 01, 2004 8:30 am
by mhester
it is magically transmogrified into multiple lines based on these characters! And it's FAST.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool :-)

Regards,

Posted: Thu Jan 01, 2004 2:58 pm
by roy
Hi,
Sorry Ray, you ment Char(10), or as it is known LF (Line Feed) on unix, and on windows Char(13) + Char(10) concatenated that makes the CRLF (Carriage Retun, Line Feed)
I think it is time to ask for my icecream :wink: .

come to think of it, if you could use varchar(max_length_of_row_in_characters) as a default table definition and use a DS routine that splits the line/row depending on your logic, that means embeds the Char(10) on unix in the proper places and simply write the new line to a sequential file.
this will get the job done in case the output scheme is the same and only the resulting number of rows varies from the input row.

this was covered several time here before, then again you didn't state if this is your case.

IHTH

Posted: Thu Jan 01, 2004 3:50 pm
by ray.wurlod
mhester wrote:
it is magically transmogrified into multiple lines based on these characters! And it's FAST.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool :-)

Regards,
I originally learned this technique from Jim Tsimis, to whom credit must go for inventing the technique, in the days when DataStage was young - probably version 1.2 or thereabouts.