Implicit versus Explicit Conversions

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Implicit versus Explicit Conversions

Post by BI-RMA »

Moderator: split from this topic. Check it out if you want to see what started all of this. Not really necessary however, just play on through...
chandra.shekhar@tcs.com wrote:But it is better to use conversion functions.
Why? DataStage should take the decision to use the same conversion function implicitly - at compile time. Using the conversion-function adds the possibility of manually inserted errors on the format-description of the function.

So the actual conversion at runtime should not be faster with explicit conversion, the derivation is more complicated and you won't even see a difference in the job-score wether you use or don't use the conversion-function.

Just some thoughts...
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

What if any junk value or spaces or Null value comes from the file ?
There's a possibility since the source is a Sequential File.
Last week I have raised a topic regarding the same. Kindly check Craig's comment.
viewtopic.php?t=149264&highlight=.
Thats why I think its better to use conversion functions where the source is a File.
Thanx and Regards,
ETL User
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

What Craig says is more than just using the conversion function, though.

He advises to check valid format first, use conversion-functions for valid strings and handle errors for invalid ones.

From my point of view the first part is the more important mention here. For valid strings, implicit conversion will always work.

NULL-handling is not an issue, by the way, at least not on version 8.5, wether You choose legacy NULL-handling or not. I would have expected a job to abort when using the explicit StringToTimestamp-function with NULL-input, but it didn't. The result was NULL for implicit and explicit conversion, both. This won't work in a stage-variable with legacy NULL-handling enabled in 8.5, or - for that matter - on any version before 8.5.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still - and this is just my opinion - with something as strongly typed as a DataStage PX job is under the covers, I consider it a Best Practice to not let it do any implicit conversions. Especially with a flat file source.

Heck, Informatica is about half-way between a Server & PX job from a type handling standpoint and even in their 'Velocity' best practice methodology advises against implicit conversions even though they will happily do them for you automagically. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

And my opinion is: If the job-score created when using either explicit or implicit conversion in DataStage is bit-identical - why should I bother?

I checked using UltraEdit on the OSH_DUMP of two versions of the same job, one using explicit conversion, one using the implicit variant.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

The job score doesn't show you what is happening within the transformer performing the conversion, only the input and outputs. So while the scores may be identical, the operation could be considerably different.

The outstanding questions are:
1) Is the source data guaranteed to be 100% valid and in the given format?
2) How should invalid/missing dates and timestamps be handled?

Implicit conversion will simply return an invalid date or timestamp value (shown as all '*' characters in a Peek or view data). Downstream consumers will need to handle accordingly.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

jwiles wrote:Implicit conversion will simply return an invalid date or timestamp value (shown as all '*' characters in a Peek or view data). Downstream consumers will need to handle accordingly.
Hi James,

What does explicit conversion do? In both cases the result is identical.

Explicit or implicit conversion does not make the difference. The important thing is that You have to make sure your input is in correct format before converting. For both methods.

But when your input is in correct format, implicit conversion is going to work just fine. And the mechanism used for the conversion is identical, too (modify routine timestamp_from_string).

So my claim still is: You only need explicit conversion when the format of the input string differs from the DataStage default-format.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

From a purely conversion-only point of view, I agree: no difference. It's largely a personal preference and habit in my case.

Re making sure the input is in the correct formation: I push back for cleaner data where and when I can, but if I have to go through the effort in my job to format and validate the data, then I'm likely to use explicit conversion as part of my logic--maybe out of habit, maybe because I prefer to see the that in the logic. I put it in the same vein as explicitly defining partitioning and sorting rather than relying upon auto insertion: I know what the logic is doing and I hope (because it's explicitly defined) that it more clearly shows to others what is happening as well. The latter is true especially with DS newcomers who have yet to learn all the nuances of when DS's implicit conversion occurs (decimal-to-string and leading space/trailing dot for instance, to reference a recent and oft-repeated thread topic) and submit-time optimizations.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I too support explicit conversion as it gives a clear idea the program is supposed to do. So that it becomes easier to understand for newcomers when they start supporting the application. It also give you better control over what is happening also avoid the implicit conversion warnings.

Its not all about what the job is supposed to do in a most efficient manner, You should also consider maintainability of the jobs/applications you develop.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi everybody,

now this has become an interesting discussion about the Dos and Don'ts in DataStage datatype-conversion. Sorry pattemk that this has gone way beyond your original question.

This is all about Chandra's comment that using explicit conversion functions is better than implicit conversion - without giving any reason for this comment whatsoever. I wanted to prove that implicit conversion - in cases where it is explicitly allowed by DataStage as documented by the manual - is neither better nor worse than explicit conversion using conversion-functions: because it does exactly the same thing.

You may prefer the one or the other because You like the looks of the one better than the other, or because You - personally - find the one easier to maintain than the other. But this has got nothing to do with the task at hand. As James pointed out: it is a matter of preference.

Your mentioning of implicit conversion warnings is not to the point here. The argument centered on the fact that we have already made sure the input-string is well-formed and valid. In these cases You won't see conversion warnings.

So my answer to pattemk is (once again): make sure your input is in valid format using the function IsValid() and if this returns true You can do either. The verified Date- or Timestamp-format would be enough documentation for me to make it clear You are doing an implicit conversion to the datatype You tested for. Technically speaking there is no difference.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

BI-RMA wrote:The argument centered on the fact that we have already made sure the input-string is well-formed and valid.
That certainly was never part of my "argument" as I was making a generic statement. :wink:

I'm also about to split this into your own thread as the original conversation came off the rails about two replies in. Thankfully it was addressed elsewhere.

... and done.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Glad that I started this :P
@BI-RMA
I was specifically talking about date conversion.
For String To Decimal, Decimal To Integer etc implicit conversions may be OK.
When I said that explicit conversion is better then I meant what Priyadarshi is trying to say.
Just optimum performance or letting datastage do its job is not the only thing we should care about.
According to me maintainability of the jobs, easier understanding for the prod support guys is also part of good development.
I think using the functions(especially for dates) can give a clear picture of the incoming data.
Thanx and Regards,
ETL User
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

BI-RMA wrote:Your mentioning of implicit conversion warnings is not to the point here. The argument centered on the fact that we have already made sure the input-string is well-formed and valid. In these cases You won't see conversion warnings.
You might not see the implicit conversion warnings in transformer but anywhere else there will be a warning.

If you only need to change the datatype, why use a transformer? Just to avoid warnings (although any other stage doesn't implicitly convert from string to date but other conversions are possible). Why not use a modify stage instead? (Yes, not if you need to use if else condition)

But that's not the point I was trying to make. I suggested using explicit conversion for the sake to maintainability and yes, personal preference is another point.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

I guess the matter is solved so marking the topic as resolved. :)
Thanx and Regards,
ETL User
Post Reply