Parameterize Pivot logic

ds_search2008 · Post by **ds_search2008** » Wed Dec 10, 2008 2:33 am

There is a scenario where the row values

Col1 | col#_1| col_data1| col#_2 | col_data_2| col6| col_data_3
100 | 1 | Z | 2 | Y | 3 | X

Using Pivot stage I'm able to achieve the output

C1 C2 C3 C4
100 1 Z A
100 2 Y A
100 3 X A

There are various inputs where the input columns Col#_1 and Col_data_2 may extend upto 'n' columns. I mean
Col#_1....col#_n
and
col_data_2.....col_data_n

The above logic and output patten is correct, however I need to make this job workable for 'n' number of Col#_1 and col_data_2 column values. With pivot we can hard coding the required columns in Pivot-->properties-->output -->derivation. I need to make this logic fit for more number of Col#_1 and col_data_2 columns for various inputs with single job.

Could you please suggest me some options?

ray.wurlod · Post by **ray.wurlod** » Wed Dec 10, 2008 3:56 am

Hard coding for each particular N is your only option in parallel job.

ds_search2008 · Post by **ds_search2008** » Wed Dec 10, 2008 9:35 am

The columns col#_1 and col_data_1 that are defined in the inputs vary in number. If I hard code these columns in pivot stage the job would work for one input but the same would fail for another. Therefore, I'm looking for an option that can make this job suitable for many possible inputs.

Say, for the input file with columns col#1, col#2, col#3 or for the input file with columns Col#1, col#2 etc. the job should work.
Similarly for col_data_1 column...
Can you please share your ideas.

kandyshandy · Post by **kandyshandy** » Wed Dec 10, 2008 10:23 am

You can do this in BASIC language and store it as a routine. I don't think you can handle variable columns in DS Server or PX job.

Aruna Gutti · Post by **Aruna Gutti** » Wed Dec 10, 2008 1:16 pm

I am not sure the number of fields you have to handle in your job.

I worked on this kind of job where I have different inputs with different number of fields. The way I handled it is make all the input rows have expected maximum number of columns for all input sources and funneled them into a pivot stage.

Hope this helps,

Aruna.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Dec 11, 2008 4:20 am

If it is in seq file (source), you can use awk in the first stage itself to re-organize

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Dec 11, 2008 4:23 am

If it is in seq file (source), you can use awk in the first stage itself to re-organize

ds_search2008 · Post by **ds_search2008** » Fri Dec 12, 2008 9:27 am

Thanks

ds_search2008 · Post by **ds_search2008** » Fri Dec 12, 2008 9:28 am

Thanks