Multiple rows from single column input

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
phanee_k
Charter Member
Charter Member
Posts: 68
Joined: Thu Nov 20, 2003 11:02 pm

Multiple rows from single column input

Post by phanee_k »

Hi,
I want to create multiple rows, by passing a single column as input.
Tried with pivot stage.but couldnt make it.
Any suggestions on this plz..

rgds
Phani
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

What the heck do you mean :?:

How is the single column composed of data? Is it a nested array/delimited column that you need to flatten, propagating other columns.

Is this what you're talking about? Ex:
  • KEY|1,2,3,4|DESC <CR>
Turns into:
  • KEY|1|DESC <CR>
    KEY|2|DESC <CR>
    KEY|3|DESC <CR>
    KEY|4|DESC <CR>
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
phanee_k
Charter Member
Charter Member
Posts: 68
Joined: Thu Nov 20, 2003 11:02 pm

Post by phanee_k »

Hi,
I need the output as u have mentioned
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Okay, so now please give us an indication as to the average number of source rows, and the average "normalization" that will occur per row. What I'm looking for is you to tell me something like: "There's 1 million source rows per run, and each row will average about 10 rows output when normalized. There's 42 columns of data that will propagate, and 1 column to use for the normalization."
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
phanee_k
Charter Member
Charter Member
Posts: 68
Joined: Thu Nov 20, 2003 11:02 pm

Post by phanee_k »

Hi,
Actual problem in like this

I have 1 lakh records with 10 fields per record.
With in in one column i have 10 comma separated values.
I need to transform these 10 comma separated values into 10 different rows in the target.


Ex

Empno| Empid |Remarks
1020|2034|8990,20608,4567,6789,56576,9876,4343,4343434,4343434,7089789


the output should be

Empno column1 column2
1020 1 8990
2 20608
3 4567
4 6789
5 56576
6 9876
7 4343
8 4343434
9 4343434
10 7089789

How can it be possible with data stage
Its typical requirement i need to do it

Thanks for the Help.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Should the employee number be in every row? and what do you do with the empid?

Regards,

Michael Hester
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sounds like the first skill you need to acquire is the skill of specifying a requirement with sufficient detail that you can actually be helped! Examples are good - just look at Michael's responses.
You could, of course, hire experienced consultants to perform these tasks the first time. Most of the good ones try to ensure that some mentoring takes place; that there is knowledge transfer so that you can apply the skills once the consultant has departed. (Those who are only in it for the money, on the other hand, might be singularly secretive about their techniques, but won't really last long in an open market. IMHO)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
phanee_k
Charter Member
Charter Member
Posts: 68
Joined: Thu Nov 20, 2003 11:02 pm

Post by phanee_k »

Hi,
Empno can be in multiple rows, if the column is normalizing into multiple rows .
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Rather than give you a solution, I'm going to give you some techniques.

If you have something like this:

Code: Select all

1000|A,B,C,D,E
and want to end up with this:

Code: Select all

1000|1|A
1000|2|B
1000|3|C
1000|4|D
1000|5|E
you can do something fairly simple using a user-defined function. You pass all the columns to propagate as a delimited array in one argument, along with the explode-by nested array column in a second argument, to a function. In the function, you will count the number of elements in the explode-by array. You will then loop, and create an array of the output in a denormalized form. Lastly, you will convert the array row delimiters into <LF> characters. This string is then the return variable for the function. Back in the DS job, you will output the result of the function to a sequential text file. Because you passed one row in to the function, but the function returned multiple rows, you end up with more data rows in your output file then the job actually wrote, because of the embedded LFs.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Except that, being on Windoze, you'll need embedded <CR><LF>.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Based on -

Code: Select all

1000|A,B,C,D,E
If there are a finite number of comma separated values and you are not familiar or proficient with BASIC then you can treat this like an occurs in COBOL and read one record in and then interrogate position 1, 2, etc... in your constraint and have the correct number of output links. Therefore, the constraint for link_1_out would send the following if a value is in the first position -

column1 = 1000
column2 = A

link_2_out would send -

column1 = 1000
column2 = B

etc...

The above will only work if the number of comma separated values is known and will not change. If this is not the case or if there can be more than 5 - 10 then I recommend using Ken's method.

Regards,

Michael Hester
Post Reply