Page 1 of 1

Select First Record only

Posted: Mon Feb 27, 2006 12:26 pm
by srinagesh
Hi !

My Source File is as follows

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
A          17-march-06                    2200
C          18-march-06                    1600
C          19-march-06                    1700
I need to take first records only (based on Site)

ie; My output should be

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
C          18-march-06                    1600
If I use a hash file, with key as Site.. I am getting the Last record (as shown below)

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
B          15-march-06                    1500
A          17-march-06                    2200
C          19-march-06                    1700
Please suggest me a way of acheiving this.

PS: Storing the data into a temporary database table with PrimaryKey as site and logic as "truncate and insert" is not an option. The requirement is such that we can use database at this stage.

Regards
Nagesh

Posted: Mon Feb 27, 2006 12:40 pm
by kumar_s
Hi,
There are several way of acheiving this. Pls do search the forum.
You are grouping based on the Site and getting the max of rest of the columns.
One of the way would be to add another column to flag out the change in group.
Set 2 stage variable in transformer like
Change = If Previous_col = Input_link.Site Then 0 Else 1
Previous_col = Input_link.Site
Assign Change to a column say Flag.
Pass the sorted data to the transformer and select only the records for which Flag =1.

Posted: Mon Feb 27, 2006 12:43 pm
by kcbland
Change your date to ISO format YYYY-MM-DD, pass thru a Sort stage and sort descending on site, date, then write to hash file. Last row into hash file with just site as key will be the oldest date for that site.

Posted: Mon Feb 27, 2006 5:48 pm
by rasi
If you are reading from database then do the order by in database SQL and then use a stage variable to hold the last record and compare with the current record. If it is same then do not send to output.

Posted: Mon Feb 27, 2006 6:39 pm
by vmcburney
There is a FAQ How do I remove duplicate rows from a data source? This will give you a few options.

Re: Select First Record only

Posted: Mon Feb 27, 2006 7:13 pm
by chowdary
Nagesh

First Sort the data on that particular column using sort stage and then in the transformer you can also use Transform function RowProcCompareWithPreviousValue to ignore duplicate values.
The routine will check the supplied value with the previous value.
If its same, then 1 is returned else 0 is returned. you can use this routing in the Stage variable.

I think this helps

Chowdary
[/b]

Posted: Tue Feb 28, 2006 4:15 am
by srinagesh
Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200
Then the output should be

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
For this reason I cant use any sort or any other feature.

Please suggest me an alternative

Posted: Tue Feb 28, 2006 4:38 am
by Poornagirija
hi srinagesh, :D
Use aggregator stage along with Xfm.
Make Site(field) as group
for the rest of columns make derivation as FIRST.

Hope that will give your answer :?

Posted: Tue Feb 28, 2006 6:32 am
by loveojha2
You can achieve it through Stage Variables and Constraints

Create Two Stage Variables

Code: Select all

Prev_Key: Derivation Curr_Key (Initiallize with a value say '-1')
Curr_Key: Derivation Src.Site

Constraint for the Output Link

Code: Select all

Prev_Key<>Curr_Key
Hope this helps.

Posted: Tue Feb 28, 2006 8:03 am
by srinagesh
Thanx Guys,

It works.


-Nagesh

Posted: Wed Mar 01, 2006 12:00 am
by djm
Belatedly throwing in an approach using UNIX ...

Code: Select all

sort -s -k1,1 yourfile |
  awk '
    BEGIN \
    {
      prev_key = "";
    }
    ( $1 != prev_key ) \
    {
      print $0;
    }
    {
      prev_key = $1;
    }
    ' > firstrowforkey
Caveats:
1) requires your sort command to have the "-s" option
2) requires my off-the-cuff awk script to be correct.

David

Posted: Wed Mar 01, 2006 2:08 am
by djm
And if your flavour of unix does not have "-s" for stable sorting, you may be able to use "sort -k1,1 -u" instead. According to the man pages I have seen, this will suppress duplicates based solely on the key field. Unfortunately, the man page for sort gives no indication which of the lines will be output (e.g. always the first, always the last, always random).

Unscientific testing on a far too small a sample set yielded a behaviour of the first instance being output - your actual mileage may vary! Does anyone know definitively?

David.

Posted: Wed Mar 01, 2006 4:52 pm
by gateleys
srinagesh wrote:Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200
Then the output should be

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
For this reason I cant use any sort or any other feature.

Please suggest me an alternative
Use Aggregator. Group by Site. For Date and Qty, use the Derivation 'First'.

gateleys