Page 1 of 1

Vertical Pivot Dynamic Array Size

Posted: Tue May 03, 2016 3:08 pm
by irshanuddin
So we have some data coming in where we need to do a vertical pivot and writing to a sequential file. But I would like to find out if I can make the file only as wide as the widest record, which means the array size for the pivot stage will need to be dynamic.
I think I can get to what it should be using an aggregator and doing a count on max number of records for the grouping, but is there a way to pass this number to the pivot stage on the fly?

Thanks!

Posted: Wed May 04, 2016 1:17 pm
by irshanuddin
Anyone?

Posted: Wed May 04, 2016 2:48 pm
by chulett
Others will need to chime in here but I'm not aware of any way to make that Array Size in the stage 'dynamic', so thinking you'll need to handle whatever passes for the maximum number of records to pivot to columns. I wonder if transformer looping would be a better option here?

And then it seems to me you'll only be able to handle the "file only as wide as the widest record" part by outputting the file as a single long string with an appropriately built set of delimited values in it.

Posted: Wed May 04, 2016 5:55 pm
by ray.wurlod
Or a Column Export stage, with its output appropriately right trimmed of delimiter characters.

Posted: Fri May 06, 2016 2:38 am
by ShaneMuir
As per chulett's suggestion, this can be done relatively easily with a transformer and looping.

You will however need to be able to provide the number of columns required to the output (also any column names if required). Using that number of columns you just build up a delimited list of values for each key column and output when required. If you output to a external target stage you can use a simple awk command to generate any number of differently structured files at the same time.

Posted: Wed Aug 31, 2016 12:58 pm
by irshanuddin
This is how I ended up resolving this issue based on Craig's input. (Thanks!)

Created looping in transformer manually. I need to familiarize myself with the built-in functionality, so for now I went old school and created the looping using stage variables.
Created a variable that incremented each time the same key came through and kept on adding hardcoded values as column headers with the counter appended at the end to create a field that has values such as vehicle1,vehicle_vin1 etc. Stored this in another variable and then appended this to the next value if the key stayed the same. So if the same key came through, the next value for this field would be vehicl1,vehicle_vin1,vehicle2,vehicle_vin2 etc.
And same thing again for actual values of these records.
So the output of this transformer is 2 columns, one with all the headers for each key field and all the values for each key field.

After all the records have been cycled through, I created a fork design to send the data values through as is and another stream to get only the largest header.

One stream does a sort on a static field and does a max on the counter and a max on header with preserve type property since this is a varchar field.

After this, I created some dummy fields to make them appear the same input and passed both to a funnel.
Also added a field to both streams called sort and assigned a 1 to the header stream sort field and a higher number to the 2nd stream sort field so that the funnel always picks the header as the first record.

This is then output to a sequential file.
The only property I am unable to provide in the output is text qualifiers for the fields, since this would lump the entire max header into one field and all the data values into 1 field.

I was unable to use the column import stage to do the split because the maximum number of columns will vary.

I hope this helps someone.
Thanks for everyone's help.