Test for Existence (was: Time conversion)

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Test for Existence (was: Time conversion)

Post by admin »

Because of the DataStage environment being free of data types, a simpler test for existence is simply to test the variable. For example: If input1.column2 Then "exists" Else "does not exist" A NULL always takes the ELSE path since we cant assert that its true.

If the data item contains a string, the Len() function may be even more efficient. For example: If Len(link1.column2) Then "exists" Else "does not exist"
Len(@NULL) always returns 0.


Enrol on "Programming with DataStage BASIC"
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hey Ray, you just made a big booboo. If you use the code:

If variable Then "Exists" Else "Not Exists"

You just made the classic mistake of overlooking that the value of ZERO evaluates as not existing. This is WRONG! Also, NULL evaluates as @FALSE only because it is indeterminate. If you code this way, you will mess up in the relational world where theres lots of @NULL, and if you code If
NOT(x) Then "Not Exists" Else "Exists" your logic gives the wrong answer when x=@NULL!

The correct code should be:
x=TRIM(Arg1)
If ISNULL(x) OR x="" Then
Ans = @FALSE
End Else
Ans = @TRUE
End

Thanks,
-Ken





ray.wurlod@Informix.Com on 21-Sep-2001 02:06



Please respond to datastage-users@oliver.com

To: datastage-users
cc:
Subject: Test for Existence (was: Time conversion)


Because of the DataStage environment being free of data types, a simpler test for existence is simply to test the variable. For example: If input1.column2 Then "exists" Else "does not exist" A NULL always takes the ELSE path since we cant assert that its true.

If the data item contains a string, the Len() function may be even more efficient. For example: If Len(link1.column2) Then "exists" Else "does not exist"
Len(@NULL) always returns 0.


Enrol on "Programming with DataStage BASIC"
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

If you express it as a single line:

Ans = Not(If IsNull(Arg1) Or Trim(Arg1) = "")

then it can be put into DS as a Transform rather than a Routine, which is a Good Thing if youre going to use this code in Transformer stages a lot. Avoid that CALL/RETURN overhead where not needed!

Cheers.........Len

> -----Original Message-----
> From: Ken_2_Bland@sbphrd.com [mailto:Ken_2_Bland@sbphrd.com]
> Sent: Friday, September 21, 2001 3:38 PM
> To: datastage-users@oliver.com
> Subject: Re: Test for Existence (was: Time conversion)
>
>
>
> Hey Ray, you just made a big booboo. If you use the code:
>
> If variable Then "Exists" Else "Not Exists"
>
> You just made the classic mistake of overlooking that the
> value of ZERO
> evaluates as not existing. This is WRONG! Also, NULL
> evaluates as @FALSE
> only because it is indeterminate. If you code this way, you
> will mess up
> in the relational world where theres lots of @NULL, and if
> you code If
> NOT(x) Then "Not Exists" Else "Exists" your logic gives the
> wrong answer
> when x=@NULL!
>
> The correct code should be:
> x=TRIM(Arg1)
> If ISNULL(x) OR x="" Then
> Ans = @FALSE
> End Else
> Ans = @TRUE
> End
>
> Thanks,
> -Ken
>
>
>
>
>
> ray.wurlod@Informix.Com on 21-Sep-2001 02:06
>
>
>
> Please respond to datastage-users@oliver.com
>
> To: datastage-users
> cc:
> Subject: Test for Existence (was: Time conversion)
>
>
> Because of the DataStage environment being free of data
> types, a simpler
> test for existence is simply to test the variable.
> For example: If input1.column2 Then "exists" Else "does not exist"
> A NULL always takes the ELSE path since we cant assert that
> its true.
>
> If the data item contains a string, the Len() function may be
> even more
> efficient.
> For example: If Len(link1.column2) Then "exists" Else "does
> not exist"
> Len(@NULL) always returns 0.
>
>
> Enrol on "Programming with DataStage BASIC"
>
>
>
>
>
>
>
>
>
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Ken,

I admit it, but then I always use Ans=Not(Not(Len(Arg1))) as my Exists function. Put the rest down to carelessness. In fact, just Ans=Len(Arg1) would do, since 0 and "" are regarded as false, and any other non-null value is regarded as true in a Boolean context by DataStage. By the way, have you tested your Exists code with " " (that is, a single space - or, for that matter, any number of spaces and nothing else) as the argument? However, lets not get into a philosophical discussion about whether a space exists. You will also find that your function returns "false" if the data is made up of any trimmable character (such as tab). The difficulty is the Trim() function, of course. Whether or not data should be trimmed is a separate argument from a test for existence, which must be answered on a case by case basis. Your business rules, whatever they are, have to be implemented.

Regards,
Ray
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Theres an expression out there: Theres a million
ways to skin a consultant. Any way you choose to do
it, the underlying necessity of a library of routines
still exists. This thread started as a request for
help in BASIC.

My recommendation is to develop this library, and code
it per the business rules of the client. Youve seen
the bag of tricks Ive been handing out. That is a
helpful starting point for many clients. They can
build from there according to their needs. If you
poke thru the folders in Manager after the import, you
see a bunch of functions and routines that handle
dates, conversions, etc. All clients end of
developing these, so the sooner you get started on
these the more successful youll be. And dont let
tech support tell you how much it slows down jobs, as modularity and reusability should be higher priorities when writing code.

Thanks,
-Ken




--- Ray Wurlod wrote:
> Ken,
>
> I admit it, but then I always use
> Ans=Not(Not(Len(Arg1))) as my Exists
> function. Put the rest down to carelessness.
> In fact, just Ans=Len(Arg1) would do, since 0 and ""
> are regarded as false,
> and any other non-null value is regarded as true in
> a Boolean context by
> DataStage.
> By the way, have you tested your Exists code with "
> " (that is, a single
> space - or, for that matter, any number of spaces
> and nothing else) as the
> argument? However, lets not get into a
> philosophical discussion about
> whether a space exists.
> You will also find that your function returns
> "false" if the data is made up
> of any trimmable character (such as tab). The
> difficulty is the Trim()
> function, of course. Whether or not data should be
> trimmed is a separate
> argument from a test for existence, which must be
> answered on a case by case
> basis. Your business rules, whatever they are, have
> to be implemented.
>
> Regards,
> Ray
>
>


__________________________________________________
Do You Yahoo!?
Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

As I understand it, a Transform becomes in-line code when the Transformer
stage is compiled, whereas a Routine maintains a separate existence,
complete with call overhead.
Therefore, maintaining a library of Transforms (where possible) and Routines
(where not) would be better value than maintaining a library of Routines
alone.

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

I place a higher priority on modularity, reusability, and test-ability. The Transform is a snippet of code included at compile time. Any changes to this transform require a recompile of all jobs that use it. In addition, you do not have the "Test" button as you do in a routine. If you seek to learn more about Universe/DataStage BASIC, the Routine is the easiest way to learn about this language via the Test button. You can write multiple lines of internally documented code. If your routine grows in complexity, you still have the Test button, as well as many lines of internal documentation. Ultimately, this routines can be defined in the Batch jobs as callable functions (DEFFUN), making this a truly resuable and modular library.

The mistake a lot of people make, including tuning experts, is to look to not use Routines because of the external call overhead. I submit that the job design itself, as part of the larger process, has so many tuning opportunities that sacrificing modularity, reusability, and test-ability should not be considered. Take for instance the job design that reads a source database, transforms, and loads the target database. The largest throughput inhibitor in this design will 99% of the time be the database. Squeezing the code, reducing the number of transformer stages, and other code tunings will NOT have the impact on throughput as splitting this job into three smaller jobs, adding a bulk-loader stage, optimizing the SQL to use parallelism during query, etc.

Good luck!
-Ken





greg_smallhorn@hotmail.com on 24-Sep-2001 01:28



Please respond to datastage-users@oliver.com

To: datastage-users
cc:
Subject: RE: Test for Existence (was: Time conversion)


As I understand it, a Transform becomes in-line code when the Transformer stage is compiled, whereas a Routine maintains a separate existence, complete with call overhead. Therefore, maintaining a library of Transforms (where possible) and Routines (where not) would be better value than maintaining a library of Routines alone.

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Does anyone know (for certain) whether the call overhead for a Routine is
incurred once (and cached), or once per row?

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Greg, the overhead is minor compared to an ETL architecture approach that doesnt use bulk-loading. As far as caching is concerned, the function is declared in the beginning of the DataStage job code and cached in memory. You incur the overhead of BASIC calling this function externally to the code, as compared to how the Transform works. In that case a transform is more like a macro, and at compile time is substituted in the code. Of course a simple 1 line function using all internal BASIC functions is going to have next to zero overhead, but youre only talking milliseconds.

Since a Routine can be many lines (Ive seen some thousands of lines long) they get a bad reputation for performance. Trust a guy who trains a lot of people on DataStage and dont worry about the overhead. You can shave milliseconds tweaking the BASIC, but your load strategy probably has minutes that can shaved.

Good luck!
-Ken





greg_smallhorn@hotmail.com on 24-Sep-2001 18:27



Please respond to datastage-users@oliver.com

To: datastage-users
cc:
Subject: RE: Test for Existence (was: Time conversion)


Does anyone know (for certain) whether the call overhead for a Routine is incurred once (and cached), or once per row?

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
Locked