compare current record with previous & next record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bicap
Premium Member
Premium Member
Posts: 95
Joined: Mon May 03, 2010 10:22 pm
Location: India
Contact:

compare current record with previous & next record

Post by bicap »

Hi All,

I need some guidance to develop a logic. This is comparison of current record with previous group and next group.
In the data below. I have Master data , and Trans Data ( currect day transactions).
Process_TS divides into groups. we have 6 groups in total.

The first group records will check in next group based on Bank_KEY + ACCT_ID. If present in next group you can ingore.
But if they are not present then, they wil be treated as deleted with giving the group process_ts into TERMINATION_TS.

if you take example of ACCT_ID = 2226472, it was present in all times, and not at last group, so
it master record is Deleted with that group time.

if you take example of ACCT_ID = 3336507, it was not present at 2014-11-06 11:32:17.104000, so it was deleted but again it got added
at 2014-11-06 18:18:18.184000 , so a new insert should be genereated, but again it was not present at 2014-11-06 20:20:20.202000, so a delete
record should be genereated with Termination_TS

like wise, we need to compare with previous group and next group. How to build in datastage 8.7 ?

Code: Select all

		   ACCT_ID	BANK_KEY	PROCESS_TS					     ROLE	SEQUENCE	TERMINATION_TS
Master	2226472	30001928	2014-09-25 19:23:01.071000		O	1	
Master	2226472	30001928	2014-09-25 19:23:01.071000		D	2	
Master	3336507	30001928	2014-09-25 19:23:01.071000		O	2	
Master	3336507	30001928	2014-09-25 19:23:01.071000		D	1	
Master	4446510	30001928	2014-09-25 19:23:01.071000		D	1	
Master	5557554	30001928	2014-09-25 19:23:01.071000		D	1
	
Transc	2226472	30001928	2014-11-06 11:32:17.104000		O	1	
Transc	2226472	30001928	2014-11-06 11:32:17.104000		D	2	
Transc	4446510	30001928	2014-11-06 11:32:17.104000		D	1	
Transc	5557554	30001928	2014-11-06 11:32:17.104000		D	1	
Transc	6662578	30001928	2014-11-06 11:32:17.104000		O	1
	
Transc	2226472	30001928	2014-11-06 16:22:41.572000		O	1	
Transc	2226472	30001928	2014-11-06 16:22:41.572000		D	2
	
Transc	2226472	30001928	2014-11-06 18:18:18.184000		O	1	
Transc	2226472	30001928	2014-11-06 18:18:18.184000		D	2	
Transc	4446510	30001928	2014-11-06 18:18:18.184000		D	1	
Transc	5557554	30001928	2014-11-06 18:18:18.184000		D	1	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		O	2	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		D	1
	
Transc	2226472	30001928	2014-11-06 19:19:19.194000		O	1	
Transc	2226472	30001928	2014-11-06 19:19:19.194000		D	2	
Transc	4446510	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	5557554	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		O	2	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	6662578	30001928	2014-11-06 19:19:19.194000		O	1
	
Transc	6662578	30001928	2014-11-06 20:20:20.202000		O	1	
						
						
Desired Output --> as below
			
		    ACCT_ID	BANK_KEY	PROCESS_TS					    ROLE	SEQUENCE	TERMINATION_TS
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		O	2			2014-11-06 11:32:17.104000
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 11:32:17.104000

Insertd	6662578	30001928	2014-11-06 11:32:17.104001		O	1			
Deleted	4446510	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 16:22:41.572000

Deleted	5557554	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 16:22:41.572000
Deleted	6662578	30001928	2014-11-06 11:32:17.104001		O	1			2014-11-06 16:22:41.572000

Insertd	4446510	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	5557554	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	3336507	30001928	2014-11-06 18:18:18.184000		O	2			
Insertd	3336507	30001928	2014-11-06 18:18:18.184000		D	1			
Insertd	6662578	30001928	2014-11-06 19:19:19.194000		O	1			

Deleted	4446510	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	5557554	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	3336507	30001928	2014-11-06 18:18:18.184000		O	2			2014-11-06 20:20:20.202000
Deleted	3336507	30001928	2014-11-06 18:18:18.184000		D	1			2014-11-06 20:20:20.202000
Deleted	2226472	30001928	2014-09-25 19:23:01.071000		O	1			2014-11-06 20:20:20.202000
Deleted	2226472	30001928	2014-09-25 19:23:01.071000		D	2			2014-11-06 20:20:20.202000
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: compare current record with previous & next record

Post by ArndW »

I think that your problem isn't as complicated as your description makes it seem, but I cannot quite understand the logic rules.
If we use just one key as an example and order the data by key, then Process timestamp and then Sequence, we get

Code: Select all

		   ACCT_ID	BANK_KEY	PROCESS_TS					     ROLE	SEQUENCE	TERMINATION_TS
Master	3336507	30001928	2014-09-25 19:23:01.071000		D	1	
Master	3336507	30001928	2014-09-25 19:23:01.071000		O	2	
Transc	3336507	30001928	2014-11-06 18:18:18.184000		D	1
Transc	3336507	30001928	2014-11-06 18:18:18.184000		O	2	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		D	1	
Transc	3336507	30001928	2014-11-06 19:19:19.194000		O	2	

Desired Output --> as below
			
		    ACCT_ID	BANK_KEY	PROCESS_TS					    ROLE	SEQUENCE	TERMINATION_TS
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		D	1			2014-11-06 11:32:17.104000
Deleted	3336507	30001928	2014-09-25 19:23:01.071000		O	2			2014-11-06 11:32:17.104000

Are you expecting that if you group your data by PROCESS_TS then each of those groups would contain all the records and any records missing are considered DELETED? Does SEQUECE and ROLE mean anything with regards to your rules? (if not, we should remove them from the example to simplify your question)
bicap
Premium Member
Premium Member
Posts: 95
Joined: Mon May 03, 2010 10:22 pm
Location: India
Contact:

Re: compare current record with previous & next record

Post by bicap »

Thanks for your reply,

Here is the business logic...
For a given day, we need to identify process_ts's at which this BANK_KEY has transactions, in this example it has
over on below timestamps.

Master 2014-09-25 19:23:01.071000
Transc 2014-11-06 11:32:17.104000
Transc 2014-11-06 16:22:41.572000
Transc 2014-11-06 18:18:18.184000
Transc 2014-11-06 19:19:19.194000
Transc 2014-11-06 20:20:20.202000

now , we need to see that , if you take one ACCT_ID as you said, 3336507 , we have to look in all datetime, if it presents then we need
to ignore it, and if is absent , then we need to generate a DELETE record,

So as per the data...

ACCT_ID BANK_KEY PROCESS_TS ROLE SEQUENCE TERMINATION_TS
Master 2226472 30001928 2014-09-25 19:23:01.071000 O 1
Master 2226472 30001928 2014-09-25 19:23:01.071000 D 2
Master 3336507 30001928 2014-09-25 19:23:01.071000 O 2
Master 3336507 30001928 2014-09-25 19:23:01.071000 D 1
Master 4446510 30001928 2014-09-25 19:23:01.071000 D 1
Master 5557554 30001928 2014-09-25 19:23:01.071000 D 1

Transc 2226472 30001928 2014-11-06 11:32:17.104000 O 1
Transc 2226472 30001928 2014-11-06 11:32:17.104000 D 2
Transc 4446510 30001928 2014-11-06 11:32:17.104000 D 1
Transc 5557554 30001928 2014-11-06 11:32:17.104000 D 1
Transc 6662578 30001928 2014-11-06 11:32:17.104000 O 1

Transc 2226472 30001928 2014-11-06 16:22:41.572000 O 1
Transc 2226472 30001928 2014-11-06 16:22:41.572000 D 2

Transc 2226472 30001928 2014-11-06 18:18:18.184000 O 1
Transc 2226472 30001928 2014-11-06 18:18:18.184000 D 2
Transc 4446510 30001928 2014-11-06 18:18:18.184000 D 1
Transc 5557554 30001928 2014-11-06 18:18:18.184000 D 1
Transc 3336507 30001928 2014-11-06 18:18:18.184000 O 2
Transc 3336507 30001928 2014-11-06 18:18:18.184000 D 1

Transc 2226472 30001928 2014-11-06 19:19:19.194000 O 1
Transc 2226472 30001928 2014-11-06 19:19:19.194000 D 2
Transc 4446510 30001928 2014-11-06 19:19:19.194000 D 1
Transc 5557554 30001928 2014-11-06 19:19:19.194000 D 1
Transc 3336507 30001928 2014-11-06 19:19:19.194000 O 2
Transc 3336507 30001928 2014-11-06 19:19:19.194000 D 1
Transc 6662578 30001928 2014-11-06 19:19:19.194000 O 1

Transc 6662578 30001928 2014-11-06 20:20:20.202000 O 1


Desired Output --> as below


ACCT_ID BANK_KEY PROCESS_TS ROLE SEQUENCE TERMINATION_TS
Deleted 3336507 30001928 2014-09-25 19:23:01.071000 O 2 2014-11-06 11:32:17.104000
Deleted 3336507 30001928 2014-09-25 19:23:01.071000 D 1 2014-11-06 11:32:17.104000

Insertd 3336507 30001928 2014-11-06 18:18:18.184000 O 2
Insertd 3336507 30001928 2014-11-06 18:18:18.184000 D 1

Deleted 3336507 30001928 2014-11-06 18:18:18.184000 O 2 2014-11-06 20:20:20.202000
Deleted 3336507 30001928 2014-11-06 18:18:18.184000 D 1 2014-11-06 20:20:20.202000

---

Initailly, in master file 3336507 records are with process_ts "2014-09-25 19:23:01.071000"
3336507 records were missing at 2014-11-06 11:32:17.104000 , so we have two DELETE records.
3336507 again added at 2014-11-06 18:18:18.184000 , so we have two INSERT records. Now these will be consider
as new master records down the lane for this ACCT_ID, as they got newly added.
3336507 again present in next time group "2014-11-06 19:19:19.194000", so we have to ignore these records with this timestamp.
3336507 records were missing at "2014-11-06 20:20:20.202000 " , so we have two DELETE got identified above.


Hope my explanation suffies in understanding the business requirement.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This is not too simple after all, mainly because you need to compare against your MASTER records many times, once for each group of records with the same process timestamp.

Perhaps you could split the data with one stream just getting a list of PROCESS_TS groups and making one record for each MASTER record coupled with the ACCT_ID and BANK_KEY. Use that as the main data to a lookup on the original data on ACCT_ID, BANK_KEY and PROCESS_TS. If the record is not found, then generate a DELETE.
gnan_gun
Participant
Posts: 30
Joined: Thu Aug 02, 2007 5:31 am
Location: Mumbai, India

Re: compare current record with previous & next record

Post by gnan_gun »

Hi,

We can achieve this by Change Capture stage. However we need to run the job as many groups as many times.

Get all groups in a sequential file and put in loop activity and run the job.

Code: Select all

         Target
           |
           v
Source -->CDC-->Filter(copy/delete/insert)-->target
Thanks
Gnan
bicap
Premium Member
Premium Member
Posts: 95
Joined: Mon May 03, 2010 10:22 pm
Location: India
Contact:

Re: compare current record with previous & next record

Post by bicap »

Thanks for your reply,
For example, if you take a bank_key, the accounts associated may vary from 1 to 100. and each has transaction on a given day any number of times. And we do have around 1+ million BANK_KEYS, from source file. We cannot run those number of times. Hope you understand the situation.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Re: compare current record with previous & next record

Post by qt_ky »

bicap wrote:This is comparison of current record with previous group and next group.
If you can think of it in terms of a comparison of the current record with the previous group only, it may simplify the logic.
Choose a job you love, and you will never have to work a day in your life. - Confucius
bicap
Premium Member
Premium Member
Posts: 95
Joined: Mon May 03, 2010 10:22 pm
Location: India
Contact:

Resolved....

Post by bicap »

Finally achieved in 3 jobs.
In first job we need to find the Insert/Changes in a file.
In Second job, need to find deletes in a file, create Ranges file, combine Master & Trasaction files on BANK_KEYS.
In Third job, need to merge the deletes file with Master + Transaction file and again find for Changes using the Ranges file and do lookup against Master file and create Deletes.


Thanks all for your help.
Post Reply