Page 1 of 2

HowTo extract data from binary file?

Posted: Tue Jan 02, 2007 5:35 am
by KadetG
Hi, All

I have file like that:
00 00 00 01 31 39 34 30 │ 2D 30 31 2D 31 32 00 09 ☺1940-01-12 ○
00 00 00 0C 00 0C FF FF │ EF 88 00 1A 46 72 69 64 ♀ ♀яяп_ →Frid
61 79 20 20 20 46 72 69 │ 00 02 00 02 00 02 33 48 ay Fri ☻ ☻ ☻3H
31 39 34 30 2D 30 31 2D │ 31 32 31 39 34 30 2D 30 1940-01-121940-0
31 2D 31 33 00 01 30 31 │ FA 1A 4A 61 6E 75 61 72 1-13 ☺01ъ→Januar
79 20 20 4A 61 6E 51 31 │ 20 20 46 4F 55 52 54 48 y JanQ1 FOURTH
20 51 55 41 52 54 45 52 │ 20 20 51 31 31 39 34 30 QUARTER Q11940
31 39 34 30 4E 4E 32 30 │ 30 33 2D 31 32 2D 30 34 1940NN2003-12-04
2D 31 37 2E 33 34 2E 32 │ 32 2E 33 35 34 33 32 38 -17.34.22.354328
32 30 30 33 2D 31 32 2D │ 30 34 2D 31 37 2E 33 34 2003-12-04-17.34
2E 32 32 2E 33 35 34 33 │ 32 38 46 49 57 30 30 31 .22.354328FIW001
33 20 0D 0A 00 00 00 02 │ 31 39 34 30 2D 30 31 2D 3 ♪◙ ☻1940-01-
I have structure of my file. For example
"DAT_PER_KEY " POSITION( 00001:00004) INTEGER
"DY_DT " POSITION( 00005:00014) DATE EXTERNAL
"DY_NUM_IN_WK " POSITION( 00015:00016) SMALLINT
"DY_NUM_IN_MO " POSITION( 00017:00020) INTEGER
"DY_NUM_IN_QTR " POSITION( 00021:00022) SMALLINT...

and I try to load it by CFF. I check any parameters and datatypes combinations and don`t can fined correct ones. I retrieve 0 in first field If I use integer for DAT_PER_KEY with length 4 and ''☺1940-01-12" for second field :(
What I do incorect?

Thenkx

Posted: Tue Jan 02, 2007 7:01 am
by DSguru2B
Seems like a dos file with that box character before date. Try to convert it to unix style or try to read it as a dos style file. How are you getting this file. If its being ftp'd directly from mainframe, then request them to transfer it in ascii so that the conversions are made for the appropriate target OS, which is unix in your case.

Posted: Tue Jan 02, 2007 7:39 am
by KadetG
Hi

this is file from mainframe in ebcdic.

Posted: Tue Jan 02, 2007 1:54 pm
by ray.wurlod
Use data type Char(10) for the DY_DT field and convert it to a date subsequently. Or use data type Date.

If the encoding is EBDCIC (which I don't believe it is, since 0x31 is 49 and Char(49) is 1 in ASCII) then you can specify translation in the Sequential File stage.

Let's look at your first record
00 00 00 01 is the integer 1 DAT_PER_KEY
31 39 34 40 2D 30 31 2D 31 32 is "1940-01-12" DY_DT
00 09 is the small integer 9 DY_NUM_IN_WK
00 00 00 0C is the integer 12 DY_NUM_IN_MO

...but then there are many more fields in your record. Your metadata is not complete. It may also be the case (though impossible to say without seeing more of the data) that your record is fixed width with no line terminator.

This looks like a time dimension table, with many possible representations of the date.

Go find the rest of the field definitions and you may find this file easier to read.

Posted: Wed Jan 03, 2007 1:13 am
by KadetG
Yes! You are rigth. I think I can read the data as ASCII. But datastage is not have the "data" type in CFF.

How I can read the integer? I have some information about record:
DAT_PER_KEY = 1 (00 00 00 01)
DY_DT = 1940-01-12 (31 39 34 30 2D 30 31 2D 31 32 )
DY_NUM_IN_WK = 5 (00 09)
DY_NUM_IN_MO = 12 (00 00 00 0C)
DY_NUM_IN_QTR = 12 (00 0C)
DY_IN_EPOCH_NUM = -21903 (FF FF EF 88 ).

Re: HowTo extract data from binary file?

Posted: Wed Jan 03, 2007 1:23 am
by sudeepmantri
Hi, first of all u got to convert the binary file to ascii, in oreder to read data from it. In the format tab of the Sequential file property dialogue box set record length as fixed,character set as EBCDIC and add a property "Export EBCDIC as ascii". This will export the binary data in Ascii form.


Thanks n Regards
Sudeep

Posted: Wed Jan 03, 2007 3:06 am
by ray.wurlod
It's NOT EBCDIC. It's perfectly readable and the code points correspond to ASCII.

"I have some information about the record" is simply not good enough. You must have ALL of the information about the record.

Once you have this you can read it with a CFF stage or with a Sequential File stage. Create the table definition in the Repository, then load it into whichever stage type you prefer.

DY_IN_EPOCH_NUM may be an unsigned integer; you can specify uint32 rather than int32 as the data type in the record schema, or unsigned in the extended column properties in the table definition.

Posted: Wed Jan 03, 2007 4:19 am
by KadetG
2sudeepmantri

I done that. But I have don`t correct the DY_NUM_IN_WK (it equal 9 but should be 5), DY_IN_EPOCH_NUM (it equal -4216 but should be -21903) and date filed by ''. I define it as char 10. :(

Posted: Wed Jan 03, 2007 12:53 pm
by us1aslam1us
KadetG wrote:2sudeepmantri

I done that. But I have don`t correct the DY_NUM_IN_WK (it equal 9 but should be 5), DY_IN_EPOCH_NUM (it equal -4216 but should be -21903) and date filed by ''. I define it as char 10. :(
What do you mean by this? As suggested by Ray use CFF stage and that will probably resolve your issue.

Sam

Posted: Wed Jan 03, 2007 4:48 pm
by ray.wurlod
The "" character is how your data viewer represents a non-printable character, for example tab or esc.

As for the "date filled by ''", look again at the hex dump to see what these characters actually are. If they're all 00 this suggests that NULL is represented by ten ASCII NUL characters in a Char(10) field.

Posted: Thu Jan 04, 2007 1:19 am
by KadetG
Hi All

I told about that file with woman who give me it. She said that it file in ebcdic. She can unload the data from mainframe database only in unicode and ebcdic.

I understand that if I read the date in ASCII then file is ASCII file. But strange that DY_NUM_IN_WK = 5 (00 09) and DY_IN_EPOCH_NUM = -21903 (FF FF EF 88 ). That information extracted from mainframe. And woman said that it bacouse the data in ebcdic. Datastage extract DY_NUM_IN_WK = 9 and DY_IN_EPOCH_NUM = -4216.

Posted: Thu Jan 04, 2007 7:44 am
by DSguru2B
Ask her to ndm the file (ftp the file) in ascii mode to you. That can be done.

Posted: Thu Jan 04, 2007 3:38 pm
by ray.wurlod
The FTP utility used to transfer the file may have effected translation from EBCDIC to ASCII.

Posted: Fri Jan 05, 2007 1:26 am
by KadetG
Hi

Sorry... how I can use that?

Posted: Fri Jan 05, 2007 9:44 am
by us1aslam1us
As DsGuru suggested ask the lady to FTP the file in Ascii mode.