I need to split a record into 'n' records, but this 'n' varies for every record in my input flat file.
Can you please help me to achieve this in datastage (I understand that I can use pivot stage to split records, but only if 'n' is known & same for all records).
Is writing a user defined routine for a better option for this?
Splitting a record into variable number of records
Moderators: chulett, rschirm, roy
Hi,
basically if your n variable is totally dynamic and so is the final number of row types, a routine sounds the best way.
be aware that nls issues, if you use nls, might occur using routines so you might want to prepare your file to the right nls prior to your routine, you can achieve this by using a format of varchar(max_row_len) and convert the entire row if possible to the right nls, or anything similar that will get you to the right nls you want.
there is also the CFF stage but I don't know if you can use it in your case (check it out).
IHTH
basically if your n variable is totally dynamic and so is the final number of row types, a routine sounds the best way.
be aware that nls issues, if you use nls, might occur using routines so you might want to prepare your file to the right nls prior to your routine, you can achieve this by using a format of varchar(max_row_len) and convert the entire row if possible to the right nls, or anything similar that will get you to the right nls you want.
there is also the CFF stage but I don't know if you can use it in your case (check it out).
IHTH
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Pivot stage can handle a variable N, but you must provide definitions for the maximum possible N. Any not represented will be either NULL or not assigned (can't recall which), which you can test for using IsNull() or Unassigned() functions.
Search the forum for a technique that involves a design like:
What you write into the sequential file is a single column containing the line of source with embedded "end of line" characters (Char(10) on UNIX). When you read from the same sequential file, it is magically transmogrified into multiple lines based on these characters! And it's FAST.
(edited after Roy's post)
Search the forum for a technique that involves a design like:
Code: Select all
stream ----> SeqFile stage ----> stream
(edited after Roy's post)
Last edited by ray.wurlod on Thu Jan 01, 2004 3:49 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I'm not sure doing it this way is either bizarre or fantastic, but is pretty coolit is magically transmogrified into multiple lines based on these characters! And it's FAST.
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Hi,
Sorry Ray, you ment Char(10), or as it is known LF (Line Feed) on unix, and on windows Char(13) + Char(10) concatenated that makes the CRLF (Carriage Retun, Line Feed)
I think it is time to ask for my icecream .
come to think of it, if you could use varchar(max_length_of_row_in_characters) as a default table definition and use a DS routine that splits the line/row depending on your logic, that means embeds the Char(10) on unix in the proper places and simply write the new line to a sequential file.
this will get the job done in case the output scheme is the same and only the resulting number of rows varies from the input row.
this was covered several time here before, then again you didn't state if this is your case.
IHTH
Sorry Ray, you ment Char(10), or as it is known LF (Line Feed) on unix, and on windows Char(13) + Char(10) concatenated that makes the CRLF (Carriage Retun, Line Feed)
I think it is time to ask for my icecream .
come to think of it, if you could use varchar(max_length_of_row_in_characters) as a default table definition and use a DS routine that splits the line/row depending on your logic, that means embeds the Char(10) on unix in the proper places and simply write the new line to a sequential file.
this will get the job done in case the output scheme is the same and only the resulting number of rows varies from the input row.
this was covered several time here before, then again you didn't state if this is your case.
IHTH
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I originally learned this technique from Jim Tsimis, to whom credit must go for inventing the technique, in the days when DataStage was young - probably version 1.2 or thereabouts.mhester wrote:I'm not sure doing it this way is either bizarre or fantastic, but is pretty coolit is magically transmogrified into multiple lines based on these characters! And it's FAST.
Regards,
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.