Reading excel file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Reading excel file

Post by sonia jacob »

Hi all,

My requirement is to read an excel file on a weekly basis, do data transformations and load it to a database. Now the catch to it is,
(1) the name of the excel file changes and (2) so does the names of tabs in them. But I use the folder stage to rename the file and write the file to another folder on a weekly basis. This takes care of my DSN connection setting, I mean I do not have to change it always. For the changing tab names, luckily only the date portion of the tab changes, hence I paramaterized it.
But I am faced with another issue, (3) the source file has all these HEADER and TRAILER rows, some of the tabs has just one row as the header but some others have about 5 to 6 rows as header. As a temporary fix I set the print area of the excel worksheet appropriately so that DS reads from the 2nd or the 7th only (once the print area is set it appears as a different table to Datastage). But I am not convinced that this would work as a long term solution.

Please can anybody give me better ideas for any or all of the issues listed above

Note : The excel file have multiple worksheets in them.
Thanks.
Regards
Sonia Jacob
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
If you can get a file/table/another spread sheet inside the excel file with info regarding from which row to read which sheet, then you might have a better automated solution, setting a constraint in the transformer to pass only records bigger then the parameter rows to skip.

In any case you need the print area defined to read the sheets.
So Unless you can some how figure out a logic for DS to distinguish which rows to process inside the print area via transformer constraints, it will rpocess all of them.

If you can get the csv files of the excell files for processing you can use that as input and figure out unwanted rows by number of fields or length and so on (filtering them by using a single varchar(2048) as your input schema to a file and then reading form the file you filtered with specific table definition).

Otherwise I doubt there is a magical solution for guessing from what row you need to process the data.

The solution you use seems OK to me, keeping in mind it may need maintanance along time (as most things does).

You might want to mention are there any constraints like: you can't manipulate the excell work book in any way and so on.

theese are from the top of my mind.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Manually looking at the sheets, how do you identify which are header, line items and trailer?
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

Sainath.Srinivasan wrote:Manually looking at the sheets, how do you identify which are header, line items and trailer?

One of the worksheet looks somthing like this-


Header : File Name Date
Column Heading : Col1 Col2 Col3 Col4 Col5

-----
Data that goes into each of these colums Col1, Col2 .....
------


Trailer : No. of records, Total Amount

I do not need the header and the trailer for my data processing.
Regards
Sonia Jacob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you read the ODBC specification for the Excel ODBC driver you will see that it requires that the worksheet has column headings and is otherwise in tabular format.

That is, there is no scope for header and trailer rows.

Can you pre-process the worksheet (perhaps into a worksheet with a standard name) so that the ODBC driver's requirements are met?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

ray.wurlod wrote:If you read the ODBC specification for the Excel ODBC driver you will see that it requires that the worksheet has column headings and is otherwise in tabular format.

That is, there is no scope for header and trailer rows.

Can you pre-process the worksheet (perhaps into a worksheet with a standard name) so that the ODBC driver's requirements are met?

I am reading the excel file using a folder stage just to change the file name. I know that I could manipulate the data column of the folder stage, when reading a .txt file, to skip data. But I do not think the same would work for an excel file as it has multiple worksheets, each with a different set of HEADER and TRAILER set of rows
:cry:
Regards
Sonia Jacob
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Sonia,

Excel is a very good spreadsheet and has some database functionality. For to use these functionalities you have to have a high level of discipline on the users, who fill these files. If you have a 99% or less disciplined team you will get trouble with loading Database-like-Tables from Excel. This is due to the (for all other need of excel welcome) flexible features.

In all other worlds (I had same stress when loading excel to SAS with SAS programms) you have the same problems.

If you have any chance to use something else then Excel: Use it and switch to it as quick as you can.

I switched all people to MS-Access and since then I have no more loading problems.

You will have some programming investment compared to Excel but you will harvest very soon your profit.

Even if you get the problem solved with excel, you are never sure, that in 8 month somebody misses the 100 % and a new problem returns and you will have a new hang.

Kind regards
Wolfgang
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

This is a 36000 feet view and method of approach to your situation.

If you know which sheet will contain which columns (defn), you can include to constraint to skip rows until you find that column.

Similarly, ignore as trailer when you do not have certain column entries.

But you need to supply more information to give a detailed method
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Just Thinking I mentioned all alternatives givven so far in my original post except for using acces but that could be similar for using csv files instead excell.

(Guess My skills for passing ideas are poor :oops: or am I being ignored :shock: - probably wrote to much :?: J/K lmao, felt I had to write this :wink: )

Do post what path you finaly choose :)

p.s. lmao= laughing my XXX off ; J/K = Just Kidding
and if you wonder, though was not mentioned, roflmao is same as lmao just having a rolling on floor prefix ;)

Good luck,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply