Hi
I have an input file that contains fields ProcessingMonth, Customer Name ,Place ... I have to assign the Actual Processing Month as the most repeated Processing Month in the input file
For Eg.
Source
-------
Proc Mth,Name,Place
-------------------------------
200405,aaaa,abc
200406,bbbb,xyz
200405,cccc,sdf
200405,dddd,lkj
200404,eeee,rst
200404,ffff,wer
The Output should be
Act ProcMth,Proc Mth,Name,Place
----------------------------------------------------------------------
200405,200405,aaaa,abc
200405,200406,bbbb,xyz
200405,200405,cccc,sdf
200405,200405,dddd,lkj
200405,200404,eeee,rst
200405,200404,ffff,wer
Can any body give a smart way to do this
Regards
Manoj
Finding the value that repeats the maximum in a file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 23
- Joined: Mon Jul 04, 2005 6:25 am
Manoj,
you will have to do 2 passes through your source file no matter what solution path you take. If your file doesn't have millions of rows and thus lookup performance isn't of paramount importance then I would go about solving this with 3 jobs - a Sequencer and 2 server jobs.
(0) write a sequence to call (a) then (b)
(a) get the most repeated month from the file. You could use an aggregation stage or transform stage variables to get this. Write the single value to a hashed file with the key = 1 and the Data = your string.
(b) use this hashed file as a lookup, putting it into memory and always using the constant "1" to read the lookup data value.
If runtime performance is hugely important, the I would modify (a) to write to a sequential file and write my own function to return this value from the file, which I would use as a parameter value passed to job (b), which no longer needs a lookup as it has the value as a parameter.
you will have to do 2 passes through your source file no matter what solution path you take. If your file doesn't have millions of rows and thus lookup performance isn't of paramount importance then I would go about solving this with 3 jobs - a Sequencer and 2 server jobs.
(0) write a sequence to call (a) then (b)
(a) get the most repeated month from the file. You could use an aggregation stage or transform stage variables to get this. Write the single value to a hashed file with the key = 1 and the Data = your string.
(b) use this hashed file as a lookup, putting it into memory and always using the constant "1" to read the lookup data value.
If runtime performance is hugely important, the I would modify (a) to write to a sequential file and write my own function to return this value from the file, which I would use as a parameter value passed to job (b), which no longer needs a lookup as it has the value as a parameter.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom