Comparing 2 datasets

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raj4756
Participant
Posts: 17
Joined: Thu Feb 26, 2004 9:07 am

Comparing 2 datasets

Post by raj4756 »

Hi All,

What is the most efficient way to compare 2 datasets, apart from the diff and change capture operators in Datastage. Is there any unix command to do that.

Thanks.

Raj
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

UNIX has several diff, cmp and others. These are very primative and can give misleading results if used wrong. They only work on text files.
Mamu Kim
raj4756
Participant
Posts: 17
Joined: Thu Feb 26, 2004 9:07 am

Post by raj4756 »

Kim,

So what would be your suggestion i.e diff operator or change capture or can you think of anything else.

Let me know.

Thanks.

Raj
cohesion
Participant
Posts: 8
Joined: Wed Feb 18, 2004 3:32 pm
Location: Canada

Post by cohesion »

raj4756 wrote:Kim,

So what would be your suggestion i.e diff operator or change capture or can you think of anything else.

Let me know.

Thanks.

Raj
Hi Raj,

I'm assuming you're trying to find an efficient way to capture source data changes. Based on the way your asking, it doesn't seem like you need to be able to detect changes at the field level. If this is the case, why not use a simple DataStage routine to do record by record comparisons? Simply write all data you need compared to sequential files using standard delimiters.
R. Michael Pickering
Senior Architect
Cohesion Systems Consulting Inc.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Raj

I think we need to know a little more detail in order to point you in the right direction. The diff command will work but it is crude. There may be a better solution if we knew more about what you are trying to accomplish.
Mamu Kim
raj4756
Participant
Posts: 17
Joined: Thu Feb 26, 2004 9:07 am

Post by raj4756 »

We want to do parallel testing between 20 different .ds files generated from DataStage ver 6.0 to 7.0 for the same batch day. So I need to make sure the data in the 2 sets of files is the same.
Please let me know if you have any more questions.

Thanks.

Raj
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Raj

The diff command would work great for this. Try it.
Mamu Kim
vzoubov
Participant
Posts: 28
Joined: Tue Feb 05, 2002 12:30 pm
Location: Boston, MA

Post by vzoubov »

kduke wrote:Raj

The diff command would work great for this. Try it.
Raj,

The change capture stage also would work just fine for comparing two datasets.

Vitali.
Post Reply