Page 1 of 2

Finding Similar records & assigning Groups

Posted: Fri Nov 04, 2011 7:48 pm
by DSFreddie
Hi All,

Requesting your thoughts/expertise to arrive in a breakthrough for one of my project requirement in DataStage.

The requirement is as follows,

I have 2 files A (Master File) & B (New Record/Update to existing records file).

I need to combine both these files(I will use Join Stage here) & based on certain Key fields(like SSN etc), I need to group the records & assign a specific group ID to each of the distinct record set.

For e.g :
File A
-----------
100 ABC 1001
200 ASD 2001
300 3001

File B
------------
100 ACE 2002
200 ASD 2003
400 XYZ 2005

Output should be as follows,
File C
------------
100 ABC 1001 G1 (group ID)
100 ACE 2002 G1
200 ASD 2001 G2
200 ASD 2003 G2
300 3001 G3
400 XYZ 2005 G4

Can you pls shed some light on how we can accomplish this in DataStage. Your thoughts will be really helpful to take it forward.

Posted: Fri Nov 04, 2011 8:51 pm
by ray.wurlod
Funnel the sources then Sort on the key field generating a Key Change column. In a downstream transformer stage use the Key Change field to trigger an increment in the Group ID.

Posted: Sun Nov 06, 2011 10:49 am
by DSFreddie
Hi Ray,

Thanks for your reply & input. I tried the solution you gave. This is what I got.

File C
------------
100 ABC 1001 1 (keyChange)
100 ACE 2002 1
200 ASD 2001 1
200 ASD 2003 0
300 3001 1
400 XYZ 2005 1
400 XYZ 2005 0
500 DEF 2000 1
500 DEF 2000 0

I somehow couldnt group the records using a transformer stage. (since the keychange values are 1/0. Can you pls be more specific how i can handle it through Transformer ?

Also, this process will run once in a week & in-case of a similar key record, it should be grouped to the same Group the key records were grouped previously.

Thanks in advance for your valuable inputs.
Freddie

Posted: Sun Nov 06, 2011 11:06 am
by chulett
That key change column is a boolean. When it is true (1) you are at the beginning of a new group and you should increment your "group id". While it is false (0) you should continue to use the current group id. Use stage variables to manage that.

Posted: Sun Nov 06, 2011 5:25 pm
by DSFreddie
Hi Chulett,

Thanks for your reply. Unfortunately, I am not able to read your full message. ( & i know a premium membership is required to view it :(

Can someone posts the full message (or) post your thoughts for this issue.

Thank You,
Freddie

Posted: Sun Nov 06, 2011 6:25 pm
by ray.wurlod
Why not get a premium membership? It's less than 30c per day, and the revenue helps to keep DSXchange alive.

Posted: Sun Nov 06, 2011 6:56 pm
by SURA
Nope, that is not fare. The solution is already provided to you.

1. Funnel it
2. sort it based on key
3. Create stage variable count,sv1,sv2, sv3 in order
4. count should be int and initial value as 0
5.Pass the sv2 to sv1
5. keycol = sv2
6. sv1 = sv2
7. In count variable, if sv3 then count + 1 else 0


In the TFM you can make your condition to get the expected result.

Let me know, if i missed / something wrong.

DS User

Logic Issue - Grouping distinct records (thru Transformer)

Posted: Sun Nov 06, 2011 7:13 pm
by DSFreddie
Sure Ray. I am planning to upgrade my membership in the near future.

While trying to use the stage variables to check the Key Change values & assigning Group_ID's, I could assign distinct values to all 1's & 0's repectively. But my logic is not working in the scenario where i should be asssigning distinct group _Id's for each of the groups.

Can you pls help me resolve this logic.

Thanks in advance
Freddie

Posted: Sun Nov 06, 2011 7:27 pm
by DSFreddie
Thanks a lot Sura for your inputs.

I did the first 2 steps. In the transformer, I didnt quite get what you explained using the stage variables.

I am creating 4 stage variables in the order COUNT,SV1,SV2 & SV3.

COUNT (first Stage variable) value = 0
What should I be doing for the rest ? (SV1,SV2 & SV3)

In the end, you stated to make the conditions to get the expected result. Can you pls give me more detail on it ?
Sorry, if i am asking basic questions here.

Thanks,
Freddie

Posted: Sun Nov 06, 2011 7:28 pm
by ray.wurlod
Sure.

Show us what you've tried. And explain in what way is does not produce the expected outcome.

Posted: Sun Nov 06, 2011 7:38 pm
by chulett
Just as a wee bit of a personal rant - anything using stage variables with names like "sv1" would never pass a peer review. Give them meaningful names, people!

Posted: Sun Nov 06, 2011 7:47 pm
by SURA
You are right. The aim is to give a hint . Anyhow i will change my approach.

DS User

Posted: Sun Nov 06, 2011 8:51 pm
by DSFreddie
Hi Sura/Ray,

Thanks for your inputs. I tried the following in the stage variable,

StageVar : if not(DSLink3.keyChange) then (StageVar) else StageVar+1

In the new field GroupID, i passed StageVar. But it is not giving me the expected output. My output should look as follows,

output
------------
100 ABC 1001 1 (keyChange) G1 (Group ID)
100 ACE 2002 1 G1
200 ASD 2001 1 G2
200 ASD 2003 0 G2
300 3001 1 G3
400 XYZ 2005 1 G4
400 XYZ 2005 0 G4
500 DEF 2000 1 G5
500 DEF 2000 0 G5

Thanks for your help in resolving this logic issue.
Freddie

Posted: Sun Nov 06, 2011 9:40 pm
by ray.wurlod
What IS it giving you? Are you deriving GroupID as

Code: Select all

"G" : StageVar
?

Posted: Sun Nov 06, 2011 10:16 pm
by stuartjvnorton
Try experimenting using different combinations of columns for the sort key.
You'll be amazed how quickly the answer will make itself known when you don't wait for people to tell you the answer.