Data Representations...

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

Data Representations...

Post by Adam_Clone »

Hi
I'd like to get some information about how the DS backend engines manage the data from different platforms...like 32-bit, 64-bit etc.
When data comes from heterogenous databases how are trhe data representations made compatible ? If the data from a 64-bit platform is to be warehoused into some repository on a 32-bit machine, will there be any data loss, say when the data is from a sequential file.
I know that the question is a bit obscure. But i believe its atleast understandable with a little reasoning.
...Regards....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Currently DataStage is a 32-bit application. You need to access data through 32-bit drivers for the relevant database.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

Further clarification...

Post by Adam_Clone »

thnx....
that has shedded some light...but may i ask...on a platform like XP or a desktop Linux for that matter.....how are large numbers (those that need more than 32 bits, say from a 64-bit platforms,say for scientific applications) be stored when data is to be "ETL'd" into something like a sequential file on a 32-bit platform like desktop linuxes ? Hope u've understood wot i asked
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The 16, 32 & 64-bit representations are more for internal machine-level pointers, offsets & code and don't relate (directly) to the data types. Thus, even on an 8-bit machine you can have a 64-bit number represented. The Database Datatypes are largely machine independant. Of course on "wider" machines many operations can be performed with a single word (which an 8-bit machine might need 4 words) so they can be significantly faster; but as far as the ETL process is concerned the machine bus width or word size is completely transparent.
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

Clarification...

Post by Adam_Clone »

Hey
But machine representation of data are different on different platforms...
For instance when I was developing an encryption algorythm in Java, on win 98, the decrypted text (set of integers representing ascii) was 16-bit while the same program on XP gave 32 bits.
Those representational differences are the ones I am talking about. I got to know that the drivers dealing with the connectivity to different Databases manage them implicitly. Can I get some details about that ?
...Regards....
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

Clarification...contd...

Post by Adam_Clone »

It was giving 32 -bit outputs on Linux also....
But types are handled by the Java Virtual Machine...no matter what the platform...My actual doubt is when some join is done between two or more tables from heterogenous databases, how are the representational consistancies preserved ? Is it converted into representations on the platform where data is warehoused by the concerned drivers ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Adam,

when you feed data to DataStage you define the metadata (i.e. column definitions) and it is there where you specify what format is coming in. The full complement of data conversion is available either straight in DataStage builtins or via the use of the OCONV and ICONV functions.
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

Post by Adam_Clone »

Arnd,
I know that the incoming type is set at the time the stage is defined...but wot i askd is ....say data is coming from a DB2 database on a mainframe...when it is 2 be warehoused in say a desktop unix system with lesser precision...the new representation may probably be inadequate right.....wot about that ?
Adam_Clone
Participant
Posts: 26
Joined: Fri Apr 08, 2005 12:58 am

....contd

Post by Adam_Clone »

.....and wot are ICONV and OCONV.....are they used for the conversions during Extraction by the drivers used for connectivity ?
Post Reply