Page 1 of 1

Aggregator and (maybe) partial schema

Posted: Thu May 26, 2011 9:06 pm
by evee1
I have searched through the forum and DS documentation but I couldn't find the similar example :(

I have a parallel job that reads the source file using the Sequential File stage with one varchar column, then splits the records into fields using
Column Import stage according to the dynamically passed schema file.
The source file is has comma separated variable length fields.
I managed to define a schema and I works OK.

Now, I would like to calculate the sum of one column, that I know the position of, for example the 5th field in the file. Is it possible to use aggregator for this?
How do I actually define a calculation to use if I cannot see the fields? The same question stands for a transformer.
I have experimented a bit with a partial schema, just to read the field I am interested in, but cannot make it work so far.

Posted: Thu May 26, 2011 9:21 pm
by ray.wurlod
Not possible. You have to name it (in the Sum specification) and therefore you have to be able to see it.

Posted: Thu May 26, 2011 9:27 pm
by evee1
Does it mean that I can only use schema files for simple passing through of fields?
I suppose the same goes for Lookup and SCD stages?

Posted: Thu May 26, 2011 9:41 pm
by ray.wurlod
Not at all. The Aggregator has special rules (the same rules as GROUP BY in SQL) - you can only pass through grouped columns or columns to which a set function is applied. You can't pass through any other columns, automatically or specifically.

Posted: Thu May 26, 2011 9:54 pm
by evee1
I see.

In this case, I will just extract the relevant portion of the input records into a defined column in a transformer and pass to the aggregator. It will be parametrized so no problem with making it generic.
I will still use the schema file to load the data into he database.

I will be dealing with lookups and SCDs in a not too distant future. But at least I know I should be able to utlilize schema then :).

Thanks.