Splitting a record into variable number of records

viji · Post by **viji** » Wed Dec 31, 2003 2:30 am

I need to split a record into 'n' records, but this 'n' varies for every record in my input flat file.
Can you please help me to achieve this in datastage (I understand that I can use pivot stage to split records, but only if 'n' is known & same for all records).

Is writing a user defined routine for a better option for this?

roy · Post by **roy** » Wed Dec 31, 2003 3:50 am

Hi,
basically if your n variable is totally dynamic and so is the final number of row types, a routine sounds the best way.
be aware that nls issues, if you use nls, might occur using routines so you might want to prepare your file to the right nls prior to your routine, you can achieve this by using a format of varchar(max_row_len) and convert the entire row if possible to the right nls, or anything similar that will get you to the right nls you want.

there is also the CFF stage but I don't know if you can use it in your case (check it out).

IHTH

ray.wurlod · Post by **ray.wurlod** » Wed Dec 31, 2003 4:00 pm

The Pivot stage can handle a variable N, but you must provide definitions for the maximum possible N. Any not represented will be either NULL or not assigned (can't recall which), which you can test for using IsNull() or Unassigned() functions.
Search the forum for a technique that involves a design like:

Code: Select all

  stream  ---->  SeqFile stage ---->  stream

What you write into the sequential file is a single column containing the line of source with embedded "end of line" characters (Char(10) on UNIX). When you read from the same sequential file, it is magically transmogrified into multiple lines based on these characters! And it's FAST.

(edited after Roy's post)

mhester · Post by **mhester** » Thu Jan 01, 2004 8:30 am

it is magically transmogrified into multiple lines based on these characters! And it's FAST.

I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool

Regards,

roy · Post by **roy** » Thu Jan 01, 2004 2:58 pm

Hi,
Sorry Ray, you ment Char(10), or as it is known LF (Line Feed) on unix, and on windows Char(13) + Char(10) concatenated that makes the CRLF (Carriage Retun, Line Feed)
I think it is time to ask for my icecream

.

come to think of it, if you could use varchar(max_length_of_row_in_characters) as a default table definition and use a DS routine that splits the line/row depending on your logic, that means embeds the Char(10) on unix in the proper places and simply write the new line to a sequential file.
this will get the job done in case the output scheme is the same and only the resulting number of rows varies from the input row.

this was covered several time here before, then again you didn't state if this is your case.

IHTH

ray.wurlod · Post by **ray.wurlod** » Thu Jan 01, 2004 3:50 pm

mhester wrote:
it is magically transmogrified into multiple lines based on these characters! And it's FAST.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool

Regards,

I originally learned this technique from Jim Tsimis, to whom credit must go for inventing the technique, in the days when DataStage was young - probably version 1.2 or thereabouts.