DSXchange

Posted: **Mon Feb 27, 2006 12:26 pm**

Hi !

My Source File is as follows

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
A          17-march-06                    2200
C          18-march-06                    1600
C          19-march-06                    1700

I need to take first records only (based on Site)

ie; My output should be

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
C          18-march-06                    1600

If I use a hash file, with key as Site.. I am getting the Last record (as shown below)

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
B          15-march-06                    1500
A          17-march-06                    2200
C          19-march-06                    1700

Please suggest me a way of acheiving this.

PS: Storing the data into a temporary database table with PrimaryKey as site and logic as "truncate and insert" is not an option. The requirement is such that we can use database at this stage.

Regards
Nagesh

Posted: **Mon Feb 27, 2006 12:40 pm**

Hi,
There are several way of acheiving this. Pls do search the forum.
You are grouping based on the Site and getting the max of rest of the columns.
One of the way would be to add another column to flag out the change in group.
Set 2 stage variable in transformer like
Change = If Previous_col = Input_link.Site Then 0 Else 1
Previous_col = Input_link.Site
Assign Change to a column say Flag.
Pass the sorted data to the transformer and select only the records for which Flag =1.

Posted: **Mon Feb 27, 2006 12:43 pm**

Change your date to ISO format YYYY-MM-DD, pass thru a Sort stage and sort descending on site, date, then write to hash file. Last row into hash file with just site as key will be the oldest date for that site.

Posted: **Mon Feb 27, 2006 5:48 pm**

If you are reading from database then do the order by in database SQL and then use a stage variable to hold the last record and compare with the current record. If it is same then do not send to output.

Posted: **Mon Feb 27, 2006 6:39 pm**

There is a FAQ How do I remove duplicate rows from a data source? This will give you a few options.

Posted: **Mon Feb 27, 2006 7:13 pm**

Nagesh

First Sort the data on that particular column using sort stage and then in the transformer you can also use Transform function RowProcCompareWithPreviousValue to ignore duplicate values.
The routine will check the supplied value with the previous value.
If its same, then 1 is returned else 0 is returned. you can use this routing in the Stage variable.

I think this helps

Chowdary
[/b]

Posted: **Tue Feb 28, 2006 4:15 am**

Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200

Then the output should be

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300

For this reason I cant use any sort or any other feature.

Please suggest me an alternative

Posted: **Tue Feb 28, 2006 4:38 am**

hi srinagesh, :D
Use aggregator stage along with Xfm.
Make Site(field) as group
for the rest of columns make derivation as FIRST.

Hope that will give your answer

Posted: **Tue Feb 28, 2006 6:32 am**

You can achieve it through Stage Variables and Constraints

Create Two Stage Variables

Code: Select all

Prev_Key: Derivation Curr_Key (Initiallize with a value say '-1')
Curr_Key: Derivation Src.Site

Constraint for the Output Link

Code: Select all

Prev_Key<>Curr_Key

Hope this helps.

Posted: **Tue Feb 28, 2006 8:03 am**

Thanx Guys,

It works.

-Nagesh

Posted: **Wed Mar 01, 2006 12:00 am**

Belatedly throwing in an approach using UNIX ...

Code: Select all

sort -s -k1,1 yourfile |
  awk '
    BEGIN \
    {
      prev_key = "";
    }
    ( $1 != prev_key ) \
    {
      print $0;
    }
    {
      prev_key = $1;
    }
    ' > firstrowforkey

Caveats:
1) requires your sort command to have the "-s" option
2) requires my off-the-cuff awk script to be correct.

David

Posted: **Wed Mar 01, 2006 2:08 am**

And if your flavour of unix does not have "-s" for stable sorting, you may be able to use "sort -k1,1 -u" instead. According to the man pages I have seen, this will suppress duplicates based solely on the key field. Unfortunately, the man page for sort gives no indication which of the lines will be output (e.g. always the first, always the last, always random).

Unscientific testing on a far too small a sample set yielded a behaviour of the first instance being output - your actual mileage may vary! Does anyone know definitively?

David.

Posted: **Wed Mar 01, 2006 4:52 pm**

srinagesh wrote:Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is
Code: Select all
Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200
Then the output should be
Code: Select all
Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
For this reason I cant use any sort or any other feature.

Please suggest me an alternative

Use Aggregator. Group by Site. For Date and Qty, use the Derivation 'First'.

gateleys

DSXchange

Select First Record only

Select First Record only

Re: Select First Record only