Reading in a binary file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Reading in a binary file

Post by EJRoufs »

For my first time ever using DataStage, I am trying to read in a binary file. Is there an easy way to do that? I'm guessing (hoping) there is, but still looking for the way. So thought I'd throw this question out there while I continue my search.

Thanks for any info! :)
Eric
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

kcbland wrote:Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.
Nope, not EBCDIC. I've read in EBCDIC files from the mainframe before with no problems... I just convert it to ASCII with a Transformer stage. This is different. They are sending me a server file (ASCII), but they said it was a Binary file, so it basically looks like garbage to me.
Eric
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

What's the structure of the binary file, did they tell you how to read it?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

kcbland wrote:What's the structure of the binary file, did they tell you how to read it? ...

Nope. That's normal with a lot of the files I get. I just have to start taking a look at them and kind of figure it out.... basically, I'm "winging" it. :) And this one is no different, except that I can't tell what anything is, because it looks like garbage. ;)
Eric
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, how do you know what constitutes a "row" of data, when it begins and ends, what attributes make up a column and meaning, etc? Sounds like you need to crack a few heads. :x

You could invent a Karnac (sp?) stage that figures out the answer without knowing the question. Sounds like a Douglas Adams story to me.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

kcbland wrote:Well, how do you know what constitutes a "row" of data, when it begins and ends, what attributes make up a column and meaning, etc? Sounds like you need to crack a few heads. :x

You could invent a Karnac (sp?) stage that figures out the answer without knowing the question. Sounds like a Douglas Adams story to me.

I am usually lucky enough that most of our data starts out with a Production Date or such, so if I can find those dates, I can usually determine row sizes by that.

The Karnac Stage definitely would be useful. Could make millions! :)
Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Surely "they" produced the file using some kind of metadata or design. Get "them" to supply the file layout(s) to you.

"Binary" is OK if the data are fixed-width columns. For example, an integer will occupy four bytes. You can declare it as Char(4) and use Oconv() to turn it into a string of ASCII decimal digits. Or a CFF stage can handle binary data directly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It usually helps quite a lot when you know which HW and which language generated the file - i.e. with Cobol when you have "garbage" you can try an EBCDIC conversion to see if it makes more sense, and if that doesn't work you know that the binary is probably going to be some form COMPutational format. If you don't see the typical binary patterns for signed number then you are stuck (unless you assume that the text is in some UTF-8 encoding for a dead language that some professor managed to convince the ISO comittee to include in the character set.
If it is PL/1 you will have excess-128 numeric representations for integers and the first words of VarChar strings and floating points are usually easy to detect by their typical format for the exponent portion.
But the only thing to do is to try to get your source metadata - there is no Rosetta Stone stage in 7.x.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

EJRoufs wrote:
kcbland wrote:Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.
Nope, not EBCDIC. I've read in EBCDIC files from the mainframe before with no problems... I just convert it to ASCII with a Transformer stage. This is different. They are sending me a server file (ASCII), but they said it was a Binary file, so it basically looks like garbage to me.
Is it possible that the file is UUEncoded? It would look like garbage.

Don't know if that helps,
Bestest,
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
Post Reply