Page 1 of 1

Add column to input

Posted: Tue Mar 24, 2015 11:14 pm
by kumarjit
Hello All.

I've a source text file with data like below:

Code: Select all

Field1
A
A
B
C
C
D
With this data, I have to generate a temporary dataset, as below:

Code: Select all

Field1 Field2
A      Y 
A      Y
B      N
C      Y
C      Y
D      N
Field2 is populated based on the following logic:
1.If the corresponding Field1 value has duplicates, the Y
2. Else N

The way I know to do this is(maybe a bit crude)

Code: Select all

1. Create a parallel job as the one below
             Seq file2
               |
               |
            Aggregator(record count, group by Field1)
               |
               |
Seq file1-----Join-----------------Seq File2(write the join output)
             based on Field1

2. Execute a Unix command as post job sub routine to add a column which performs the following: 
a. For record where record count is >1 add a new field with value Y
b. For record where record count is =1 add a new field with value N
But I'm to create a dataset output, and not any text file, and Unix commands do not work on dataset files.
As because I'm to introduce a new column, can't Column Generator stage serve this purpose, without using any Transformer/Sort?


Please help.

Regards,
Kumarjit.

Posted: Wed Mar 25, 2015 6:54 am
by udayanguha
You can use a sort stage to generate key change column. Then in the transformer through stage variables check for change in key change column and assign value accordingly.
If key change column is '0', assign Y. If previous key change column was 0 and current is '1', assign 'Y' else 'N'

Posted: Wed Mar 25, 2015 7:34 am
by kumarjit
If you had checked the last few lines of my post, you might remember that I'm trying to achieve this goal WITHOUT USING TRANFORMER/SORT STAGES.....

Anywayz, thanks for your feed.

Regards.

Posted: Wed Mar 25, 2015 3:34 pm
by ray.wurlod
Are you permitted to use a Modify stage?

If yes use a column generator to generate "Y" for all rows then use the Modify stage to convert the NULL from left outer join into "N". And/or use a fork/join to split the streams based on the result of the join (or lookup).

Posted: Wed Mar 25, 2015 8:50 pm
by kumarjit
Thanks Ray, but I was not able to view you full post as it's Premium Content . :)
However, I will try and change the design to extent I was able to see in your post.

Regards,
Kumarjit.

Posted: Thu Mar 26, 2015 8:41 am
by AshishDevassy
Is there a reason that you dont wish to use transformer ?

Posted: Fri Mar 27, 2015 10:18 pm
by kumarjit
AshishDevassy wrote:Is there a reason that you dont wish to use transformer ?
I intend not to load the job, when the same can be achieved by other lightweight stages like column generator and/or modify stages.

Regards.

Posted: Fri Mar 27, 2015 11:02 pm
by kumarjit
What I'm trying to do is:

Code: Select all


           Seq file2 
               | 
               | 
            Aggregator(record count, group by Field1) 
               | 
               | 
Seq file1-----Join-----------------Column Generator Stage--------------------------Taget Dataset
				   (column to generate=F2, column method=Explicit)   

In the Mapping tab of the column generator stage, add the following as the derivation for the output field F2
If(input.count=1 then'N' else 'Y')


But, can I create such derivations against an output column of the column generator stage?

Please advise.

Regards.

Posted: Sun Mar 29, 2015 3:56 pm
by ray.wurlod
kumarjit wrote:I intend not to load the job, when the same can be achieved by other lightweight stages like column generator and/or modify stages.
You are relying on out-of-date knowledge. These days (since about version 8.7) the Transformer stage is no less efficient than most other stages, sometimes it's more efficient (for example than the Filter stage).

Posted: Tue Mar 31, 2015 12:57 am
by kumarjit
ray.wurlod wrote:You are relying on out-of-date knowledge. ...
I'm afraid to admit that its true to some sense. But if there are not more than 1K rows in the input, should I be trying something as time consuming as a transformer?

Please advise.

Regards.

Posted: Tue Mar 31, 2015 1:48 am
by priyadarshikunal
What makes you think transformer is a time consuming stage. The weight of transformer has decreased over time and its not an expensive stage anymore. Now its even lighter than filter and switch stages. If you can combine work of 2 or more stages in transformer, it may give you better result as well.

I think you were not able to see the complete reply from Ray.

Posted: Tue Mar 31, 2015 1:51 am
by priyadarshikunal
In addition, Join and Aggregation needs sorted as well as partitioned data, so it will insert a sort under the covers as well.

Posted: Wed Apr 01, 2015 5:43 am
by kumarjit
I'm not a premium member, and I'm not able to view Ray's posts.
Anyways, thanks to all of you for your time and suggestions.

Warm Regards.

Posted: Wed Apr 01, 2015 8:01 am
by qt_ky
Well by all means, sign up. It's incredibly affordable.