Compare Two Values in Sequential File

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Compare Two Values in Sequential File

Post by iq_etl »

We're running 9.1 with parallel jobs. However, we use a generic sequence that kicks of the parallel jobs to load the tables. This sequence gets the row count on the table before and after the table is loaded using server jobs in Job Activities and these counts are written to two separate sequential files.

I want an additional Job Activity that contains a job that takes in the two sequential files and compares the count on one with the other. Being a server job, I can't do this in a Transformer stage as they don't allow for two inputs in server. Should this be a parallel job? Should I use server, but with a Link Collector or Merge stage? I haven't had any success there. Should these be datasets instead of sequenial files?

So, what's the best way to have a (currently server) job take in and compare one row of data from two different datasources?

This sounds easy, so I must be missing the obvious.

Thanks!
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You would use the server Merge stage to join two sequential files in a server job.

Alternatively, you could copy one sequential file to a hashed file and then have a reference input along with the stream input into a transformer stage.

Mike
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Well, not super obvious if you've never done it before... :-) Two more options, both of which don't require writing a new job.

First option:
Use two Execute Command stages to output the contents of the sequential files to "the screen" with echo commands. This will put the contents of the file in an activity variable for each stage. Then use a Nested Condition stage to compare the results together and to change execution paths if the two items don't match.

Second option:
Write a small shell script that will compare the two row counts and set an exit code to indicate whether the row counts match. Execute the shell script from an Execute Command stage and then branch accordingly based on the exit code.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

If possible, I'd like to handle this within Datastage, so that rules out the shell script, unfortunately.

1) Merge Stage: I understand the Merge stage requires both inputs to have the same number of rows and outputs the same number. Currently, I have sequential file 1 (SF1) with essentially one column PreRowCount. SF2 has PostRowCount. if I use the Merge stage I'd combine them into one new column, but it would have two rows. Then, in a following transformer stage I can do my comparison using row 1 vs row 1, correct?

2) Hash Input: I didn't think the Transformer stage will take a reference link in a server job.

3) Execute Commands: Are these in Transformer stages? Looks like Basic functions.

After this job compares the two, I want to pass a flag back up to the sequence that will be evaluated in a Nested Condition on whether to send the team an email or not. Will I need to pass this flag from the server job back out to the sequence through jobstatus or something?

Thanks again for everyone's input!
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Execute Commands and Nested Conditions are Job Sequence stages.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Ha! I see that plain as day now. I'll give this a shot.

Thanks!
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

asorrell wrote:Use two Execute Command stages to output the contents of the sequential files to "the screen" with echo commands. This will put the contents of the file in an activity variable for each stage. Then use a Nested Condition stage to compare the results together and to change execution paths if the two items don't match.
asorrell,

I've got the sequential file exactly how I want it. It contains three records. The records are:

1) pre row count
2) post row count
3) 'Y' or 'N' (if an email needs to be sent)

So, how can I use that Execute Command activity to evaluate that third record so that the following Nested Condition can trigger a Notification Activity if the value is 'Y'? That is, how do I echo the value to the 'screen'?

Would it be
Command: echo
Parameters: <path to file and file name>

Again, how would I isolate that third value?

The DS documentation on Execute Command isn't super descriptive:
http://pic.dhe.ibm.com/infocenter/iisin ... oracc.html

Thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Looks like Andy was assuming you'd be getting two separate counts (or simply counting two files) via two Execute Command stages and then the Nested Condition stage would be the one doing the 'evaluating' of the results, deciding if they matched or not.

If you've got everything in one file with three records then you can still as noted simply echo or cat the file in a single Execute Command stage and then check the results. Command Output is captured in a dynamic array so you can use array notation to access specific values in the result. Off the top of my head it's labelled as $CommandOutput so:

Code: Select all

ExecuteCommandStageName.$CommandOutput<3>

would hold the third record. You could then check for 'Y' or 'N' and branch accordingly with (or without) a Nested Condition stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Having the counts included in the data is critical information that you left out of your initial problem description.

It doesn't matter what commands you use to get the count, as long as it is output to "the screen". You can play with various piped combinations of the head and tail commands to isolate each of the specific line counts. Once you get it to work at the UNIX level, then just put each command in a separate stage. It doesn't matter which line you put it on (command / parameters) since they are just concatenated together and executed.
Last edited by asorrell on Tue Dec 10, 2013 12:34 pm, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

asorrell wrote:Having the counts included in the data is critical information that you left out of your initial problem description.
I apologize. I didn't have that data all in one source until this morning.

To both, having never written anything to 'the screen' yet. Do I do this in the 'Command:' line of the ExeCommand tab, then check the $CommandOutout<3> in the trigger?

So...

ExecCommand tab
Command: echo <activitystagename>.$CommandOutput<3>
Parameters: <pathtosequentialfile>

Triggers tab
Expression: $CommandOutput<3> = 'Y'

Is that right? I don't think "$CommandOutput<3> = 'Y'" is an expression.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Just put the actual commands to be executed in the "Command" portion of the Execute Command stage.

For example (note that head / tail syntax varies per O/S):

ecstage_FirstLine:
Command: head -n 1 /opt/IBM/test

ecstage_SecondLine:
Command: head -n 2 /opt/IBM/test | tail -n 1

That (on AIX) will output the first and second line so it can be captured by each of the stages in their $CommandOutput activity variable.

In your nested condition you can do whatever testing you'd like. For instance, Trigger:
ecstage_FirstLine.$CommandOutput = ecstage_SecondLine.$CommandOutput

You should always use the ellipses (...) button to insert the Activity variable from a pull-down list so it is spelled correctly.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

With the Execute Command stage having it's own trigger tab, do I even need a Nested Condition? This will be the last action I want in the sequence. Check and see if an email (Notification Activity) needs to be sent.

Also:
ecstage_FirstLine:
Command: head -n 1 /opt/IBM/test

ecstage_SecondLine:
Command: head -n 2 /opt/IBM/test | tail -n 1
Should ecstage_SecondLine be:
Parameters: head -n 2 /opt/IBM/test | tail -n 1

I'm running windows OS.

EDIT: Also, I assume '/opt/IBM/test' means '/<path>/<path>/<filename>' is that correct?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

iq_etl wrote:I don't think "$CommandOutput<3> = 'Y'" is an expression.
It certainly is... it evaluates to true or false. Use the helper to build the expression, even if you have to edit it later to complete it. As I noted, you need to include the name of the stage in the variable so it knows which Command Ouput to be looking at.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

iq_etl wrote:With the Execute Command stage having it's own trigger tab, do I even need a Nested Condition?
Need? No, which is why I noted you could do all this with or without that stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

iq_etl wrote:I'm running windows OS.
Your first post is flagged for OS: UNIX so my examples supplied UNIX commands. Replace with Windows equivalents as appropriate.

And yes - substitute a Windows path as well.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply