compressed character fields

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

compressed character fields

Post by hitmanthesilentassasin »

Hi,

Well, We are getting files with compressed character fields from assembler programs(theese are not COMP-3 fields) which when we try to read through data stage shows junk characters. the character set is EBCDIC. hence, to read it properly we have to read these fields as ascii(we define this in the osh after the job is compiled) and then append a byte using a c program for each character(when I try to append a byte with the prefix byte option it still doesnt work). Is there a way or an NLS that we can define that we dont have to define ascii and then append the byte instead ds does it all it self using a NLS.

can we define our own NLS? if yes, then how?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

"compressed character fields"? :?

OK, so they're not COMP-3 but what exactly are they? Never heard of them and the only occurrance of that particular string that Google returns an exact match on is your post here. Can you get a more "official" name / description for what is in those fields? How exactly they were "compressed"?
-craig

"You can never have too many knives" -- Logan Nine Fingers
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

well, this is my understanding of the fields. when I say compressed characters I mean in general 1 character is stored in 1 byte in ascii and ebcdic. here they are storing 2 digits in 1 byte. to do that they are storing the data in the binary format so that they can save the nybble and store another digit in the saved nybble. and also, they have converted the values in hex which means in 1 nybble the can store the value upto 15 -->F and in 1 byte they can store the value upto 255 --> FF(2 digits stored in 1 byte). Hence, we have a tedious process of reading these files, so, wanted to know if any one of you have come across such scenario or can throw any light about how to proceed.

trying to read the field as binary has not helped anyways.

Thanks!!!!
munch9
Premium Member
Premium Member
Posts: 34
Joined: Fri Sep 15, 2006 7:26 am

Post by munch9 »

From your description is sounds like it could be either packed decimal or binary but you say you have tried both of those.

Do you have an example of the raw data (with hex values) and what value it should be interpreted as. Might make it easier for people here to identify.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

the other scenario which I had missed is that there is an occurs depending on clause as well. the occurs depending on column is again in hex packed decimal format which complex flat file fails to read.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

I started this thread thinking there could be some NLS that could I am over looking. Seems, like there is now NLS so, I think its better to write a custom stage to read the data. can some one please guide me where to start it from?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... no interest in posting the examples as requested?
-craig

"You can never have too many knives" -- Logan Nine Fingers
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

well, I really cant get the data out of the clients network. so cant post it
munch9
Premium Member
Premium Member
Posts: 34
Joined: Fri Sep 15, 2006 7:26 am

Post by munch9 »

A couple of example fields may suffice so, provided you can read it on your screen, you could type it in to a post rather than copy it from your clients network.

It would probably be easier for you in the long run if it can be identified and handled by a CFF rather than you re-inventing the wheel in a custom stage.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

well, below is the sample data. they call it packed unsigned(there is no sign nibble thats what makes it different from COMP-3).

in the byte representation the data looks like this to store a 4 byte data field.

01357
02468

Sorry - I tried to put the data but couldnt.
munch9
Premium Member
Premium Member
Posts: 34
Joined: Fri Sep 15, 2006 7:26 am

Post by munch9 »

I can't seem to be able to read this data using a CFF however,

If you read the data in a sequential file, define the column as binary with a length equal to the number of bytes.
In a BASIC transformer, in the column derivation use the transform DataTypePicComp3Unsigned(<column>)
Define the output column as integer (or decimal as you require) with the appropriate length

Your example of
01357
02468

Returns 12345678

Hope that helps unless/until someone can find a better way.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

yes, its a non-standard way of doing it where they have taken out the sign nibble and stored without any sign nibble. it doesnt work when I eleminate signed=?.

is there any other trick to handle this?
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

Finally I have managed to get some dummy data out of clients network. below are the file details.

01 XXXXX.
05 XXXXX-ID PIC X(1).
05 XXXXX-NAME PIC X(1).
05 XXXXX-XC PIC X(1).
05 XXXXX-SC OCCURS 0 TO 255 TIMES
DEPENDING ON XXXXX-XC.
10 XXXXX-SC-TAMT PIC X(2).
10 XXXXX-SC-TDT PIC X(2).


the file is a variable block file like

BDW : 0407(this is stored in the hex value)
RDW: 0403(This is stored in the hex value)
XXXXX-ID --> 10 (although this is a string it has packed values without the sign nibble.)
XXXXX-NAME --> 20 (although this is a string it has packed values without the sign nibble.)
XXXXX-SC --> FF(Hex value)
XXXXX-SC-TAMT --> 100(255 times in packed format without the sign nibble)
XXXXX-SC-TDT --> 200(255 times in packed format without the sign nibble).

here is the link to the sample file
http://rapidshare.com/files/400142047/packed.html
Post Reply