Page 1 of 2

selecting one record out of 2 records

Posted: Wed May 06, 2015 9:14 pm
by vamsi.4a6
I have data as mentioned below.source is sequential file

source,destination,distance
chennai,hyderabad,500
hyderabad,chennai,500
chennai,bangalore,600
bangalore,chennai,600
chennai,bangalore,600

I have to select 1 record out of 2 records.ay record is fine.
chennai,hyderabad,500
hyderabad,chennai,500


I need algorithim how to proceed and then i will think about logic

Posted: Wed May 06, 2015 10:07 pm
by ray.wurlod
YOU need to provide the algorithm about which record to preserve. Then use a Remove Duplicates stage on partitioned, sorted data.

Posted: Sat May 09, 2015 2:26 am
by vamsi.4a6
I need to select one record out of 2 records.any record is fine.similary for rest of the records also.Nit sure how to proceed and anybody help on this

chennai,hyderabad,500
hyderabad,chennai,500

Posted: Sat May 09, 2015 8:02 am
by chulett
:!: Simply repeating your post doesn't help any of us understand it better. Your example doesn't seem to match up to your words. "1 out of 2 records" means what exactly - every other record? Every even? Every odd? Something else entirely? All your example shows are the first two records being 'selected'.

So, until you can properly explain what it is you need, no-one will be able to help you without guessing. Never mind the fact that once you do, the answer may become clear to you as well! :wink:

Posted: Sun May 10, 2015 5:22 am
by ray.wurlod
Create a VarChar column derived as Start:End and another derived as End:Start. Use these to effect your comparisons and duplicate removal.

Posted: Sun May 10, 2015 6:51 am
by chulett
See here.

Posted: Sun May 10, 2015 5:04 pm
by ray.wurlod
Yes, I thought I'd answered this question already.

Mirabile dictu, I came up with the same answer!

Posted: Thu Jun 25, 2015 5:48 pm
by vamsi_4a6
I have concatenated as mentioned below

Start,End,Start_End,End_Start,Distance
Bangalore,Mumbai,BangaloreMumbai,MumbaiBangalore,1500
Hyderabad,Delhi,HyderabadDelhi,DelhiHyderabad,2000
Delhi,Hyderabad,DelhiHyderabad,HyderabadDelhi,2000
Mumbai,Bangalore,MumbaiBangalore,BangaloreMumbai,1500

How to use above Start_End,End_Start columns for comparisions?

Posted: Thu Jun 25, 2015 6:21 pm
by stuartjvnorton
Come on, mate. At least try to work it out on your own...

Posted: Thu Jun 25, 2015 9:17 pm
by vamsi_4a6
I stuckup with how to compare two records within a group ans selecting one record out of it.

Posted: Thu Jun 25, 2015 10:30 pm
by ray.wurlod
Are you the same user as vamsi.4a6, who has over 330 posts?

Anyway, and irrespective of that, think about the present problem this way: what is the purpose of constructing Start:End and End:Start columns?

The answer to that is that, if they are the same, then the records belong in the same group.

From that you can use whatever technique you like, for example the LastRowInGroup() function or "remembering" stage variables, to identify a single record from that group.

And I guess you'll need to get yourself a premium membership.

Posted: Thu Jun 25, 2015 10:39 pm
by stuartjvnorton
vamsi_4a6 wrote:I stuckup with how to compare two records within a group ans selecting one record out of it.
Come on, man. You've got a blog.
Seriously, order the names alphabetically in a transformer and then remove duplicates.

eg:

Here There 666
There Here 666

Becomes

Here There 666
Here There 666

Now pick one.


I want my 2 minutes back.
Sheesh.

Posted: Wed Jan 06, 2016 4:02 am
by vamsi_4a6
Anybody help on how to proceed?

Posted: Wed Jan 06, 2016 9:33 am
by chulett
Honestly? Reread all of the previous replies. Try something. Then come back and ask specific questions based on the results, if you still have any. Questions, that is... not results. :wink:

Posted: Wed Jan 06, 2016 9:45 pm
by naveenkumar.ssn
vamsi_4a6 wrote:Anybody help on how to proceed?
Hi,

Step1: First sort the rows where you wanted to check with
Example : a b 100
b a 200

After you sort the above records it would be like the below:
a b 100
a b 200

Use the remove duplicate stage to get either the first record or the second records.(Keep those 2 column as key column)

I hope you understood !!!

Reply back if not.

Regards
Naveen