Page 1 of 1

compare current record with previous & next record

Posted: Wed Nov 12, 2014 10:53 am
by bicap
Hi All,

I need some guidance to develop a logic. This is comparison of current record with previous group and next group.
In the data below. I have Master data , and Trans Data ( currect day transactions).
Process_TS divides into groups. we have 6 groups in total.

The first group records will check in next group based on Bank_KEY + ACCT_ID. If present in next group you can ingore.
But if they are not present then, they wil be treated as deleted with giving the group process_ts into TERMINATION_TS.

if you take example of ACCT_ID = 2226472, it was present in all times, and not at last group, so
it master record is Deleted with that group time.

if you take example of ACCT_ID = 3336507, it was not present at 2014-11-06 11:32:17.104000, so it was deleted but again it got added
at 2014-11-06 18:18:18.184000 , so a new insert should be genereated, but again it was not present at 2014-11-06 20:20:20.202000, so a delete
record should be genereated with Termination_TS

like wise, we need to compare with previous group and next group. How to build in datastage 8.7 ?

Code: Select all

		   ACCT_ID	BANK_KEY	PROCESS_TS					     ROLE	SEQUENCE	TERMINATION_TS
Master	2226472	30001928	2014-09-25 19:23:01.071000		O	1	
Master	2226472	30001928	2014-09-25 19:23:01.071000		D	2	
Master	3336507	30001928	2014-09-25 19:23:01.071000		O	2	
Master	3336507	30001928	2014-09-25 19:23:01.071000		D	1	
Master	4446510	30001928	2014-09-25 19:23:01.071000		D	1	
Master	5557554	30001928	2014-09-25 19:23:01.071000		D	1
	
Transc	2226472	30001928	2014-11-06 11:32:17.104000		O	1	
Transc	2226472	30001928	2014-11-06 11:32:17.104000		D	2	
Transc	4446510	30001928	2014-11-06 11:32:17.104000		D	1	
Transc	5557554	30001928	2014-11-06 11:32:17.104000		D	1	
Transc	6662578	30001928	2014-11-06 11:32:17.104000		O	1
	
Transc	2226472	30001928	2014-11-06 16:22:41.572000		O	1	
Transc	2226472	30001928	2014-11-06 16:22:41.572000		D	2
	
Transc	2226472	30001928	2014-11-06 18:18:18.184000		O	1	
Transc	2226472	30001928	2014-11-06 18:18:18.184000		D	2	
Transc	4446510	30001928	2014-11-06 18:18:18.184000		D	1	
Transc	5557554	30001928	2014-11-06 18:18:18.184000		D	1	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		O	2	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		D	1
	
Transc	2226472	30001928	2014-11-06 19:19:19.194000		O	1	
Transc	2226472	30001928	2014-11-06 19:19:19.194000		D	2	
Transc	4446510	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	5557554	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		O	2	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	6662578	30001928	2014-11-06 19:19:19.194000		O	1
	
Transc	6662578	30001928	2014-11-06 20:20:20.202000		O	1	
						
						
Desired Output --> as below
			
		    ACCT_ID	BANK_KEY	PROCESS_TS					    ROLE	SEQUENCE	TERMINATION_TS
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		O	2			2014-11-06 11:32:17.104000
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 11:32:17.104000

Insertd	6662578	30001928	2014-11-06 11:32:17.104001		O	1			
Deleted	4446510	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 16:22:41.572000

Deleted	5557554	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 16:22:41.572000
Deleted	6662578	30001928	2014-11-06 11:32:17.104001		O	1			2014-11-06 16:22:41.572000

Insertd	4446510	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	5557554	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	3336507	30001928	2014-11-06 18:18:18.184000		O	2			
Insertd	3336507	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	6662578	30001928	2014-11-06 19:19:19.194000		O	1			

Deleted	4446510	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	5557554	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	3336507	30001928	2014-11-06 18:18:18.184000		O	2			2014-11-06 20:20:20.202000
Deleted	3336507	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	2226472	30001928	2014-09-25 19:23:01.071000		O	1			2014-11-06 20:20:20.202000
Deleted	2226472	30001928	2014-09-25 19:23:01.071000		D	2			2014-11-06 20:20:20.202000

Re: compare current record with previous & next record

Posted: Wed Nov 12, 2014 2:53 pm
by ArndW
I think that your problem isn't as complicated as your description makes it seem, but I cannot quite understand the logic rules.
If we use just one key as an example and order the data by key, then Process timestamp and then Sequence, we get

Code: Select all

		   ACCT_ID	BANK_KEY	PROCESS_TS					     ROLE	SEQUENCE	TERMINATION_TS
Master	3336507	30001928	2014-09-25 19:23:01.071000		D	1	
Master	3336507	30001928	2014-09-25 19:23:01.071000		O	2	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		D	1
Transc	3336507	30001928	2014-11-06 18:18:18.184000		O	2	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		O	2	

Desired Output --> as below
			
		    ACCT_ID	BANK_KEY	PROCESS_TS					    ROLE	SEQUENCE	TERMINATION_TS
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 11:32:17.104000
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		O	2			2014-11-06 11:32:17.104000

Are you expecting that if you group your data by PROCESS_TS then each of those groups would contain all the records and any records missing are considered DELETED? Does SEQUECE and ROLE mean anything with regards to your rules? (if not, we should remove them from the example to simplify your question)

Re: compare current record with previous & next record

Posted: Wed Nov 12, 2014 3:51 pm
by bicap
Thanks for your reply,

Here is the business logic...
For a given day, we need to identify process_ts's at which this BANK_KEY has transactions, in this example it has
over on below timestamps.

Master 2014-09-25 19:23:01.071000
Transc 2014-11-06 11:32:17.104000
Transc 2014-11-06 16:22:41.572000
Transc 2014-11-06 18:18:18.184000
Transc 2014-11-06 19:19:19.194000
Transc 2014-11-06 20:20:20.202000

now , we need to see that , if you take one ACCT_ID as you said, 3336507 , we have to look in all datetime, if it presents then we need
to ignore it, and if is absent , then we need to generate a DELETE record,

So as per the data...

ACCT_ID BANK_KEY PROCESS_TS ROLE SEQUENCE TERMINATION_TS
Master 2226472 30001928 2014-09-25 19:23:01.071000 O 1
Master 2226472 30001928 2014-09-25 19:23:01.071000 D 2
Master 3336507 30001928 2014-09-25 19:23:01.071000 O 2
Master 3336507 30001928 2014-09-25 19:23:01.071000 D 1
Master 4446510 30001928 2014-09-25 19:23:01.071000 D 1
Master 5557554 30001928 2014-09-25 19:23:01.071000 D 1

Transc 2226472 30001928 2014-11-06 11:32:17.104000 O 1
Transc 2226472 30001928 2014-11-06 11:32:17.104000 D 2
Transc 4446510 30001928 2014-11-06 11:32:17.104000 D 1
Transc 5557554 30001928 2014-11-06 11:32:17.104000 D 1
Transc 6662578 30001928 2014-11-06 11:32:17.104000 O 1

Transc 2226472 30001928 2014-11-06 16:22:41.572000 O 1
Transc 2226472 30001928 2014-11-06 16:22:41.572000 D 2

Transc 2226472 30001928 2014-11-06 18:18:18.184000 O 1
Transc 2226472 30001928 2014-11-06 18:18:18.184000 D 2
Transc 4446510 30001928 2014-11-06 18:18:18.184000 D 1
Transc 5557554 30001928 2014-11-06 18:18:18.184000 D 1
Transc 3336507 30001928 2014-11-06 18:18:18.184000 O 2
Transc 3336507 30001928 2014-11-06 18:18:18.184000 D 1

Transc 2226472 30001928 2014-11-06 19:19:19.194000 O 1
Transc 2226472 30001928 2014-11-06 19:19:19.194000 D 2
Transc 4446510 30001928 2014-11-06 19:19:19.194000 D 1
Transc 5557554 30001928 2014-11-06 19:19:19.194000 D 1
Transc 3336507 30001928 2014-11-06 19:19:19.194000 O 2
Transc 3336507 30001928 2014-11-06 19:19:19.194000 D 1
Transc 6662578 30001928 2014-11-06 19:19:19.194000 O 1

Transc 6662578 30001928 2014-11-06 20:20:20.202000 O 1


Desired Output --> as below


ACCT_ID BANK_KEY PROCESS_TS ROLE SEQUENCE TERMINATION_TS
Deleted 3336507 30001928 2014-09-25 19:23:01.071000 O 2 2014-11-06 11:32:17.104000
Deleted 3336507 30001928 2014-09-25 19:23:01.071000 D 1 2014-11-06 11:32:17.104000

Insertd 3336507 30001928 2014-11-06 18:18:18.184000 O 2
Insertd 3336507 30001928 2014-11-06 18:18:18.184000 D 1

Deleted 3336507 30001928 2014-11-06 18:18:18.184000 O 2 2014-11-06 20:20:20.202000
Deleted 3336507 30001928 2014-11-06 18:18:18.184000 D 1 2014-11-06 20:20:20.202000

---

Initailly, in master file 3336507 records are with process_ts "2014-09-25 19:23:01.071000"
3336507 records were missing at 2014-11-06 11:32:17.104000 , so we have two DELETE records.
3336507 again added at 2014-11-06 18:18:18.184000 , so we have two INSERT records. Now these will be consider
as new master records down the lane for this ACCT_ID, as they got newly added.
3336507 again present in next time group "2014-11-06 19:19:19.194000", so we have to ignore these records with this timestamp.
3336507 records were missing at "2014-11-06 20:20:20.202000 " , so we have two DELETE got identified above.


Hope my explanation suffies in understanding the business requirement.

Posted: Thu Nov 13, 2014 1:47 am
by ArndW
This is not too simple after all, mainly because you need to compare against your MASTER records many times, once for each group of records with the same process timestamp.

Perhaps you could split the data with one stream just getting a list of PROCESS_TS groups and making one record for each MASTER record coupled with the ACCT_ID and BANK_KEY. Use that as the main data to a lookup on the original data on ACCT_ID, BANK_KEY and PROCESS_TS. If the record is not found, then generate a DELETE.

Re: compare current record with previous & next record

Posted: Thu Nov 13, 2014 11:08 am
by gnan_gun
Hi,

We can achieve this by Change Capture stage. However we need to run the job as many groups as many times.

Get all groups in a sequential file and put in loop activity and run the job.

Code: Select all

         Target
           |
           v
Source -->CDC-->Filter(copy/delete/insert)-->target
Thanks
Gnan

Re: compare current record with previous & next record

Posted: Fri Nov 14, 2014 12:59 pm
by bicap
Thanks for your reply,
For example, if you take a bank_key, the accounts associated may vary from 1 to 100. and each has transaction on a given day any number of times. And we do have around 1+ million BANK_KEYS, from source file. We cannot run those number of times. Hope you understand the situation.

Re: compare current record with previous & next record

Posted: Sat Nov 15, 2014 2:48 pm
by qt_ky
bicap wrote:This is comparison of current record with previous group and next group.
If you can think of it in terms of a comparison of the current record with the previous group only, it may simplify the logic.

Resolved....

Posted: Tue Nov 18, 2014 9:01 am
by bicap
Finally achieved in 3 jobs.
In first job we need to find the Insert/Changes in a file.
In Second job, need to find deletes in a file, create Ranges file, combine Master & Trasaction files on BANK_KEYS.
In Third job, need to merge the deletes file with Master + Transaction file and again find for Changes using the Ranges file and do lookup against Master file and create Deletes.


Thanks all for your help.