Page 1 of 1

Posted: Thu Jun 13, 2013 7:05 am
by chulett
Probably related to how it writes out records, as noted in the documentation they are "delivered in a single column" - a LongVarChar. However, what happens if you sort the output by City first? Or have you?

Posted: Thu Jun 13, 2013 8:13 am
by verify
No Change.Still it is giving the same result.

Posted: Thu Jun 13, 2013 8:26 am
by eostic
I remember doing this and realizing that it needed everything in a single large column. ...so I concatenated all the values together, row after row, followed by CRLFs, and then used an aggregator to get the "last" value for each key (each "city").

The output derivation stays red, but works: (simple example but should give you the idea)...

outputLink.wholeLine : inputLink.myColumn : char(13) : char(10)

If you send this to a sequential stage, you'll see it grow in a cascading sort of fashion...

row1
row1 row2
row1 row2 row3

etc.

...aggregate it in another downstream stage by the key and get take the last value.....

...send those sorted rows (one per key) into the Folder Stage.

Ernie

Posted: Thu Jun 13, 2013 9:21 am
by chulett
Thought about this on the long drive to work... and was going to suggest the same thing. Thanks for the save, Ernie. :wink:

Posted: Thu Jun 13, 2013 4:07 pm
by greggknight
Not quite sure what you are trying to do, but I would take and use a sf > transformer > and three seq files as out put. put a constraint on the first column to divert the records to the appropriate file a.txt,b.txt,c.txt.

I don't know why you are using a folder stage. a folder stage is used to read numerous files within a folder and process the data in each file.

the record definition for this is
filename <varchar(100)
record <LongVarChar(999999)

Posted: Thu Jun 13, 2013 5:06 pm
by chulett
It also supports being a target and dynamically changing the output filename based on data on the input link. That's the draw here where the number of files needed and their names are not known ahead of time.

Posted: Fri Jun 14, 2013 12:49 am
by verify
Since the data is very huge so i thought of using the unix script and it worked fine.
Thanks everyone for your valuable suggestion. :)