Importing PDF's into SQL Database

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ewartpm
Participant
Posts: 97
Joined: Wed Jun 25, 2003 2:15 am
Location: South Africa
Contact:

Importing PDF's into SQL Database

Post by ewartpm »

:? Hi guys

Is it possible, using DataStage, to store a PDF document in a SQL Server database. If it is possible, what stage would I use to do this?

Many Thanks.
Mark Ewart-Phipps
Centerfield Software (Pty) Ltd
Tel +27 11 462 7003
Fax +27 11 462 7392
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You DO NOT want to EVER pass blobs, clobs, whatever thru an ETL tool. I hope I'm clear enough on this matter.

You should be passing tokens such as a filename in its place. This means that you don't have to juggle megabytes of data and pass it around. Think about the underlying mechanics of what's going on. You cannot use bulk loaders, you cannot use sequential (text files remember) files. Every byte of data has to pass thru the ETL pipeline. You do not have, and IMHO, should have, the ability to clobs/blobs in the tool.

You're going to have to deal with manually moving this type of data into the database. I would suggest preserving the token (filename) as a distinct column, and then after-load updating the data to move clobs/blobs into the database table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I second Ken's remarks. :idea:

If you must do it, pre-encode it with something that converts it to a text-compatible encoding, use Sequential File stages, don't attempt to transform it, and decode the resulting text file prior to bulk loading into SQL Server. Yuk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

SQL Server has the ntext field which can accept text with up to 2 ^ 30 - 1 characters, which is just over 1 billion for those of you who, like me, can't compute those numbers in our heads. This should be sufficient if you want to encode your PDF files into a text stream. Something you can do quite easily in .Net. It's a pretty common technique when you want to stick a binary file, such as a PDF, into an XML object.

You have also got the BLOB data type but again you need something external to the ETL tool to read and write the BLOB. Have a look at this Microsoft support article on using .Net to do this:
http://support.microsoft.com/?id=326502

You can also leave your PDFs on the filesystem and put a link to them in the SQL Server database. This is the easiest approach but you've got issues with security and access.
ewartpm
Participant
Posts: 97
Joined: Wed Jun 25, 2003 2:15 am
Location: South Africa
Contact:

Post by ewartpm »

Thanks guys, for the comments, it's much appreciated. I think I will go with the 'link' option and then take it from there.
Mark Ewart-Phipps
Centerfield Software (Pty) Ltd
Tel +27 11 462 7003
Fax +27 11 462 7392
Post Reply