ray.wurlod wrote:
I agree that there's a need for something like the SDK suite for parallel jobs; such a suite is indicative of a mature product - it took some years before it was developed for server jobs.
Exactly, the whole idea of using an ETL tool is to bring down the development time. From my experience, the problem with external function is to test it comprihensively before publishing it [exception handling and such]. i remeber someone was asking in some other post, how to find the full month from the input date value... write some case statement like if month =1 then January ....
ray.wurlod wrote:
DataStage BASIC does not really have data types in the conventional sense; it uses a structure called a DATUM that can change "data type" on the fly. Clearly there are overheads involved (but it is not true to say that server requires two bytes for Char - it can be more, depending on a number of factors it probably will be -for example any DATUM needs to carry a REMOVE pointer and a hint mechanism for the EXTRACT function).
if the internal memory allocation of datatype is not fixed[like say 2 byte for char, 1 byte for int, how can it read any input??. Is DATUM something like read till find end of character '\0'??. i think '\0' is true for strings only
].
i thought unicode characters can be represented in 2 bytes. could you please be kind to eloborate more on DATUM.
A web defintion of Unicode
==================
"A 16-bit character encoding scheme allowing characters from Western European, Eastern European, Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, Thai, Urdu, Hindi and all other major world languages, living and dead, to be encoded in a single character set"
ray.wurlod wrote:
DataStage BASIC can pass data to and fro to C functions through interfaces such as the General Call Interface which is essentially a scratch pad for converting between the strongly-typed C environment and the DATUM-based DataStage BASIC environment.
Any Call Interface should have the data-type mappings/conversion atleast correct.
example:
i have some string "abcde" in some system, say UNIX and let us consider that UNIX will allocate 2 bytes for a char. totally it will take 5+1(1 char for end of character].
Say , if DS tries to read the character by assuming that chars are 1 byte , then it read the 2 bytes represnting 1 char in UNIX for 2 charcters and produce 2 chars, wont it ??
ray.wurlod wrote:
This conversion overhead is the main issue when using BASIC Transformer stage in parallel jobs.
Except when using Server Shared Containers or BASIC Transformer stages, and when passing parameter values from a job sequence to a parallel job, there is not much call for passing values between the two environments.
i am sorry, i am confused here. which are the 2 environments.
is it not true that everytime an external function is called , we will incur an overhead.
A general question [maybe hijacking the thread]
=============
Why not there is some stand alone operator for NULL in Parallel job, the literal which represnt "unknown" values. it sucks when the only equivalent is setNull() which unfortunately returns an int8.
as a example, i have written a C function to implement the trim functionlaity. i have passed test strings like space(8):string , string:space(8) , space(8) etc. when i try to pass a NULL , it doesnt allow me becuase there is no way to represent a NULL in parallel job.
so i have created 1 more dummy C function which returns NULL
// C return value
return NULL
my call goes like Exfn_Trim(Exfn_givemeNULL) for every input string. The worst case is when i return NULL from my C routine , and i check for ISNULL(Exfn_Trim(Exfn_givemeNULL)) it doesnt recognize it.
it treats the NULL return value from the external C function like an empty string ""
if it will help to discuss ,i can paste my C functions here
i think , this non-availability of "good old" NULL, is a serious limitation, why cant it understand nulls??
thanks for your patience
regards,
Prabu