Splitting a record into variable number of records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
viji
Participant
Posts: 3
Joined: Thu Aug 14, 2003 2:41 am

Splitting a record into variable number of records

Post by viji »

I need to split a record into 'n' records, but this 'n' varies for every record in my input flat file.
Can you please help me to achieve this in datastage (I understand that I can use pivot stage to split records, but only if 'n' is known & same for all records).

Is writing a user defined routine for a better option for this?
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
basically if your n variable is totally dynamic and so is the final number of row types, a routine sounds the best way.
be aware that nls issues, if you use nls, might occur using routines so you might want to prepare your file to the right nls prior to your routine, you can achieve this by using a format of varchar(max_row_len) and convert the entire row if possible to the right nls, or anything similar that will get you to the right nls you want.

there is also the CFF stage but I don't know if you can use it in your case (check it out).

IHTH
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Pivot stage can handle a variable N, but you must provide definitions for the maximum possible N. Any not represented will be either NULL or not assigned (can't recall which), which you can test for using IsNull() or Unassigned() functions.
Search the forum for a technique that involves a design like:

Code: Select all

  stream  ---->  SeqFile stage ---->  stream
What you write into the sequential file is a single column containing the line of source with embedded "end of line" characters (Char(10) on UNIX). When you read from the same sequential file, it is magically transmogrified into multiple lines based on these characters! And it's FAST.

(edited after Roy's post)
Last edited by ray.wurlod on Thu Jan 01, 2004 3:49 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

it is magically transmogrified into multiple lines based on these characters! And it's FAST.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool :-)

Regards,
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Sorry Ray, you ment Char(10), or as it is known LF (Line Feed) on unix, and on windows Char(13) + Char(10) concatenated that makes the CRLF (Carriage Retun, Line Feed)
I think it is time to ask for my icecream :wink: .

come to think of it, if you could use varchar(max_length_of_row_in_characters) as a default table definition and use a DS routine that splits the line/row depending on your logic, that means embeds the Char(10) on unix in the proper places and simply write the new line to a sequential file.
this will get the job done in case the output scheme is the same and only the resulting number of rows varies from the input row.

this was covered several time here before, then again you didn't state if this is your case.

IHTH
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

mhester wrote:
it is magically transmogrified into multiple lines based on these characters! And it's FAST.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty cool :-)

Regards,
I originally learned this technique from Jim Tsimis, to whom credit must go for inventing the technique, in the days when DataStage was young - probably version 1.2 or thereabouts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply