Diff between server Job Parallel Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A DATUM is, in C terminology, a structure. One of the items in that structure is an indicator of what kind of item is currently being held in that DATUM. For example, if it's an integer, then it's stored in four bytes. If it's a float, then it's stored in eight bytes using a 51-bit mantissa and an 11-bit (shifted) exponent, as described in the IEEE standards. If it's a connection to an ODBC data source, then it's a pointer to the structure returned by SQLAllocConnect(). And so on. If it's a Char(1) the actual character is stored in one byte (non-NLS) within the structure. If it's a Char(1) the actual character is stored in somewhere between one and four bytes, since DataStage uses a UTF-8 encoding of Unicode. A DATUM might also hold a file handle, a pointer to a subroutine, DataStage's internal representation of NULL (one byte, value 0x80), an "unassigned variable" (nothing in the data area). As mentioned earlier, other elements in the structure support statements and functions such as REMOVE and EXTRACT. It all works - it's been around more than 20 years. Don't worry about it. And the call interfaces (ICI and GCI) and the supplied NLS maps do have the data type mappings correct; I invite you to take it up with IBM if you believe that this is not the case.

BTW, if you visit Unicode Consortium web site you will get the full story about Unicode, which can be a 16-bit or 32-bit encoding. UTF encodings specify variable-length representations.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply