Schema files - generation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Schema files - generation

Post by TonyInFrance »

Dear all,

I have table definitions for around a hundred IBM DB2 tables. All of these are CREATE TABLE statements. I need to generate schema files for each of these. Is there anyway this can be done other than manually?

Copying the schema definitions from the 'Table Definitions' section in the Designer client is not the best way since a lot of modifications need to be incorporated thereafter. I am specifically looking for any tool that might exist which I don't know of.

Your help would be greatly appreciated as always.

Tony
Tony
BI Consultant - Datastage
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

Why don't you use a script calling orchdbutil (which will fail for LOB columns) for each table? or you could build a script to get each table description in a file, then building the table definition by parsing them.

I've started to build such script, but it's still a work in progress (for 1 year :)) as it seems to "eat" the first line. It requires a script to run a query on your database.

Code: Select all

#!/products/python2.5/bin/python

import os, subprocess, sys

def parseLine(test):
        columnname = test.strip().split(' ')[0].strip()
        notnull = test.find("NOT NULL")

        if (notnull < 0):
                init = len(columnname)+1
                end = len(test)
                typetmp = test[init:end].strip().split('(')
                type = typetmp[0]
                nullable = """ nullable """
        else:
                init = notnull+8
                end = len(test)
                typetmp = test[init:end].strip().split('(')
                type =  typetmp[0]
                nullable = """ """

        validtypes = {"VARCHAR":"ustring","VARCHAR2":"ustring","NVARCHAR2":"ustring","CHAR":"ustring","NCHAR":"ustring","NUMBER":"decimal","DATE":"timestamp","TIMESTAMP":"timestamp"}

        if (validtypes.has_key(type)):
                newtype = validtypes[type]
                if (len(typetmp)==2):
                        typelen = typetmp[1].strip().split(')')[0].split(' ')[0]
                        if newtype.find("VAR"):
                                max = "max="
                        else:
                                max = ""
                        typelen = "["+max+typelen+"]"
                else:
                        typelen = """ """
                return columnname + " : " + nullable + newtype + typelen + ";"
        else:
                return "@"+columnname

def writeOsh(filename,filename2):
        f=open(filename,"r")
        fs=open(filename2,"w")
        fs.write("record\n")
        fs.write("(\n")
        i=0
        for line in f:
                if (line.strip()!=""):
                        if i<2:
                                i=i+1
                        else:
                                newline = parseLine(line)
                                if (newline[0]=="@"):
                                        print "The table contains a colunm ("+newline[1:]+") which can't be used directly within datastage (CLOB,NCLOB...)"
                                        continue
                                #print newline
                                fs.write(newline+"\n")
        fs.write(")\n")
        f.close
        fs.close

####MAIN####
path = os.getcwd()

env = sys.argv[1]
table = sys.argv[2]
tableQuery = "DESCRIBE "+table
filename = path+"/"+table+".schema"
filename2 = path+"/"+table+".osh"
runsqlpath = path+"/RunSql2.ksh"

command = ["/bin/ksh", runsqlpath, env, tableQuery, filename]
ret = subprocess.Popen(command)
ret.wait()

writeOsh(filename,filename2)

print

Code: Select all

[dstage-dev@slxd2001:~/EPH/scripts_python]$ cat PS_E1_TRANSCO_PARM.osh
record
(
DESCR50 :  ustring[max=50];
E1_VWNAME_PSFT :  ustring[max=15];
E1_DTTM_PSFT :  nullable timestamp[max=6];
E1_DATASET :  ustring[max=50];
E1_DTTM_DSTG :  nullable timestamp[max=6];
E1_TRANSCO_USE :  ustring[max=1];
LASTUPDOPRID :  ustring[max=30];
LASTUPDDTTM :  nullable timestamp[max=6];
)
[dstage-dev@slxd2001:~/EPH/scripts_python]$ cat PS_E1_TRANSCO_PARM.schema
 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 E1_CODE_PARAMETRE                                                 NOT NULL VARCHAR2(20 CHAR)
 DESCR50                                                           NOT NULL VARCHAR2(50 CHAR)
 E1_VWNAME_PSFT                                                    NOT NULL VARCHAR2(15 CHAR)
 E1_DTTM_PSFT                                                               TIMESTAMP(6)
 E1_DATASET                                                        NOT NULL VARCHAR2(50 CHAR)
 E1_DTTM_DSTG                                                               TIMESTAMP(6)
 E1_TRANSCO_USE                                                    NOT NULL VARCHAR2(1 CHAR)
 LASTUPDOPRID                                                      NOT NULL VARCHAR2(30 CHAR)
 LASTUPDDTTM                                                                TIMESTAMP(6)
 DESCRLONG                                                                  CLOB


Eric
Last edited by eph on Mon Jan 21, 2013 7:00 am, edited 1 time in total.
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Thanks Eric.

I am trying to get my head around the script. Might be easier said than done since don't know Python at all.

Greatly appreciate you help...:-)

Regards
Tony
BI Consultant - Datastage
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

First, I realize that I assumed in my answer that you wanted a file and not an import in DS as table definition (which is something different).

If you know that most of you tables don't have LOB columns, you can use orchdbutil and process manually for the remaining ones.

I updated the script, which is now working for Oracle databases (there are two lines of header for an sqlplus DESCRIBE command output, not 3). The RunSql2.ksh script takes 3 parameters, one for the connection (which database), the second one for the query, the last one for the output filename.

First thing is to get a description of the table in a file (in this case using sqlplus + DESCRIBE). Then to parse it to build the table definition, replacing database types and properties (null/not null) by the correct ones. You can also replace CLOB by using unbounded ustring I suppose.

This is just an example, but I guess you can build our proper script in the language you prefer.

Eric
Post Reply