HowTo extract data from binary file?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

HowTo extract data from binary file?

Post by KadetG »

Hi, All

I have file like that:
00 00 00 01 31 39 34 30 │ 2D 30 31 2D 31 32 00 09 ☺1940-01-12 ○
00 00 00 0C 00 0C FF FF │ EF 88 00 1A 46 72 69 64 ♀ ♀яяп_ →Frid
61 79 20 20 20 46 72 69 │ 00 02 00 02 00 02 33 48 ay Fri ☻ ☻ ☻3H
31 39 34 30 2D 30 31 2D │ 31 32 31 39 34 30 2D 30 1940-01-121940-0
31 2D 31 33 00 01 30 31 │ FA 1A 4A 61 6E 75 61 72 1-13 ☺01ъ→Januar
79 20 20 4A 61 6E 51 31 │ 20 20 46 4F 55 52 54 48 y JanQ1 FOURTH
20 51 55 41 52 54 45 52 │ 20 20 51 31 31 39 34 30 QUARTER Q11940
31 39 34 30 4E 4E 32 30 │ 30 33 2D 31 32 2D 30 34 1940NN2003-12-04
2D 31 37 2E 33 34 2E 32 │ 32 2E 33 35 34 33 32 38 -17.34.22.354328
32 30 30 33 2D 31 32 2D │ 30 34 2D 31 37 2E 33 34 2003-12-04-17.34
2E 32 32 2E 33 35 34 33 │ 32 38 46 49 57 30 30 31 .22.354328FIW001
33 20 0D 0A 00 00 00 02 │ 31 39 34 30 2D 30 31 2D 3 ♪◙ ☻1940-01-
I have structure of my file. For example
"DAT_PER_KEY " POSITION( 00001:00004) INTEGER
"DY_DT " POSITION( 00005:00014) DATE EXTERNAL
"DY_NUM_IN_WK " POSITION( 00015:00016) SMALLINT
"DY_NUM_IN_MO " POSITION( 00017:00020) INTEGER
"DY_NUM_IN_QTR " POSITION( 00021:00022) SMALLINT...

and I try to load it by CFF. I check any parameters and datatypes combinations and don`t can fined correct ones. I retrieve 0 in first field If I use integer for DAT_PER_KEY with length 4 and ''☺1940-01-12" for second field :(
What I do incorect?

Thenkx
Alex
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Seems like a dos file with that box character before date. Try to convert it to unix style or try to read it as a dos style file. How are you getting this file. If its being ftp'd directly from mainframe, then request them to transfer it in ascii so that the conversions are made for the appropriate target OS, which is unix in your case.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Hi

this is file from mainframe in ebcdic.
Alex
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use data type Char(10) for the DY_DT field and convert it to a date subsequently. Or use data type Date.

If the encoding is EBDCIC (which I don't believe it is, since 0x31 is 49 and Char(49) is 1 in ASCII) then you can specify translation in the Sequential File stage.

Let's look at your first record
00 00 00 01 is the integer 1 DAT_PER_KEY
31 39 34 40 2D 30 31 2D 31 32 is "1940-01-12" DY_DT
00 09 is the small integer 9 DY_NUM_IN_WK
00 00 00 0C is the integer 12 DY_NUM_IN_MO

...but then there are many more fields in your record. Your metadata is not complete. It may also be the case (though impossible to say without seeing more of the data) that your record is fixed width with no line terminator.

This looks like a time dimension table, with many possible representations of the date.

Go find the rest of the field definitions and you may find this file easier to read.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Yes! You are rigth. I think I can read the data as ASCII. But datastage is not have the "data" type in CFF.

How I can read the integer? I have some information about record:
DAT_PER_KEY = 1 (00 00 00 01)
DY_DT = 1940-01-12 (31 39 34 30 2D 30 31 2D 31 32 )
DY_NUM_IN_WK = 5 (00 09)
DY_NUM_IN_MO = 12 (00 00 00 0C)
DY_NUM_IN_QTR = 12 (00 0C)
DY_IN_EPOCH_NUM = -21903 (FF FF EF 88 ).
Alex
sudeepmantri
Participant
Posts: 54
Joined: Wed Oct 25, 2006 11:07 pm
Location: Hyderabad

Re: HowTo extract data from binary file?

Post by sudeepmantri »

Hi, first of all u got to convert the binary file to ascii, in oreder to read data from it. In the format tab of the Sequential file property dialogue box set record length as fixed,character set as EBCDIC and add a property "Export EBCDIC as ascii". This will export the binary data in Ascii form.


Thanks n Regards
Sudeep
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's NOT EBCDIC. It's perfectly readable and the code points correspond to ASCII.

"I have some information about the record" is simply not good enough. You must have ALL of the information about the record.

Once you have this you can read it with a CFF stage or with a Sequential File stage. Create the table definition in the Repository, then load it into whichever stage type you prefer.

DY_IN_EPOCH_NUM may be an unsigned integer; you can specify uint32 rather than int32 as the data type in the record schema, or unsigned in the extended column properties in the table definition.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

2sudeepmantri

I done that. But I have don`t correct the DY_NUM_IN_WK (it equal 9 but should be 5), DY_IN_EPOCH_NUM (it equal -4216 but should be -21903) and date filed by ''. I define it as char 10. :(
Alex
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

KadetG wrote:2sudeepmantri

I done that. But I have don`t correct the DY_NUM_IN_WK (it equal 9 but should be 5), DY_IN_EPOCH_NUM (it equal -4216 but should be -21903) and date filed by ''. I define it as char 10. :(
What do you mean by this? As suggested by Ray use CFF stage and that will probably resolve your issue.

Sam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "" character is how your data viewer represents a non-printable character, for example tab or esc.

As for the "date filled by ''", look again at the hex dump to see what these characters actually are. If they're all 00 this suggests that NULL is represented by ten ASCII NUL characters in a Char(10) field.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Hi All

I told about that file with woman who give me it. She said that it file in ebcdic. She can unload the data from mainframe database only in unicode and ebcdic.

I understand that if I read the date in ASCII then file is ASCII file. But strange that DY_NUM_IN_WK = 5 (00 09) and DY_IN_EPOCH_NUM = -21903 (FF FF EF 88 ). That information extracted from mainframe. And woman said that it bacouse the data in ebcdic. Datastage extract DY_NUM_IN_WK = 9 and DY_IN_EPOCH_NUM = -4216.
Alex
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ask her to ndm the file (ftp the file) in ascii mode to you. That can be done.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The FTP utility used to transfer the file may have effected translation from EBCDIC to ASCII.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Hi

Sorry... how I can use that?
Alex
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

As DsGuru suggested ask the lady to FTP the file in Ascii mode.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
Post Reply