Dataset Schemas

Bolman · Post by **Bolman** » Thu Oct 26, 2006 5:36 pm

Hi all.

I am experimenting with datasets and RCP (DS 7.5.1a).

Is there a way to extract the schema from a dataset (or its header) after runtime using an Orchestrate command line?

ray.wurlod · Post by **ray.wurlod** » Thu Oct 26, 2006 10:39 pm

Yes, the orchadmin utility has a subcommand for this task.

tagnihotri · Post by **tagnihotri** » Sat Oct 28, 2006 7:04 am

I will suggest to be careful changing the schema, though it sounds crazy but using the schema definition from existing dataset is really cool to be used in flexi or dynamic jobs.

Chuah · Post by **Chuah** » Sun Oct 29, 2006 6:41 pm

ray.wurlod wrote:Yes, the orchadmin utility has a subcommand for this task. ...

Hi Ray,
Where can I get a copy of the orchadmin Guide assuming there is one ?

Chin

ray.wurlod · Post by **ray.wurlod** » Sun Oct 29, 2006 8:36 pm

Your support provider should be able to supply at no cost. You receive them when attending IBM training classes. There may be a link from this forum to where they can be downloaded - you could essay a Search.

ray.wurlod · Post by **ray.wurlod** » Sun Oct 29, 2006 8:43 pm

If you type in orchadmin with no command line options or with the -help command line option you get a syntax summary output.

Code: Select all

$ bin/orchadmin
##I TFCN 000001 20:40:38(000) <main_program>
Ascential DataStage(tm) Enterprise Edition 7.5
Copyright (c) 2004, 1997-2004 Ascential Software Corporation.
All Rights Reserved



NAME
     orchadmin - delete, copy, describe and dump ORCHESTRATE files

SYNOPSIS
     orchadmin command [ -options... ] descriptor-files...

     orchadmin command [-help]     # prints help message for one command

     orchadmin [-help]             # prints help message for all commands

     orchadmin -f command-file     # executes commands from specified file

     orchadmin -                   # executes commands from standard input

DESCRIPTION
     orchadmin executes commands which delete, copy, and describe
     ORCHESTRATE files.  These commands may be given on the command
     line or read from a file or the standard input.

     command             delete, copy, describe, dump or check.

     -f command-file     Path of a file containing orchadmin commands.
                         The file may have multiple commands separated
                         by semicolons.  A command may be spread over
                         multiple lines.  C and C++ style comments and
                         csh style quotation marks are allowed.

     -                   Read commands from the standard input as if it
                         were a command file.

     -help | -h          Write usage information to the standard output.

    In addition there are the following NLS related options:

    -input_charset map-name     Specifies the encoding of option values.
    -output_charset map-name    Specifies the encoding of orchadmin output.
    -os_charset map-name        Specifies the encoding of data passed to or
                                received from the operating system via
                                "char *".
    -escaped            Allows command line characters to be presented
                        in a two-byte Unicode hex format.


COMMAND: copy | cp  source-descriptor-file  target-descriptor-file

     Copy the schema, contents and preserve-partitioning flag of the
     specified ORCHESTRATE file dataset.  If the preserve-partitioning
     flag is set, the copy will have the same number of partitions and
     record order as the original.  If the target file already exists,
     it will be truncated first.  If the preserve-partitioning flag of
     the source file is set and the target file already exists, it must
     have the same number of partitions as the source file.

     The copy command has no options.  A warning message is issued if
     the target does not already exist.  This is a bug, not a feature.


COMMAND: delete | del | rm  [ -options... ]  descriptor-files...

     Delete the specified descriptor files and all of their data files.

OPTIONS:
     -f             Force.  Proceed even if some partitions of the
                    dataset are on nodes that are inaccessible from the
                    current configuration file.  This will leave orphan
                    data files on those nodes.  They must be deleted by
                    some other means.

     -x             Use the system config file rather than the one
                    stored in the dataset.

EXAMPLE:
     Delete all datasets in the current directory that end in ".ds".

          orchadmin rm *.ds


COMMAND: truncate  [ -options... ]  descriptor-files...

     Remove data from the specified datasets.

OPTIONS:
     -f             Force.  Proceed even if some partitions of the
                    dataset are on nodes that are inaccessible from the
                    current configuration file.  This will leave orphan
                    data files on those nodes.  They must be truncated by
                    some other means.

     -x             Use the system config file rather than the one
                    stored in the dataset.

     -n segment     Leave this many segments.  The default is 0.

EXAMPLE:
     Truncate big.ds to 10 segments:

          orchadmin truncate -n 10 big.ds

     Remove all data from small.ds:

          orchadmin truncate small.ds


COMMAND: dump  [ -options... ]  descriptor-files...

     Dump the specified ORCHESTRATE parallel files as text to the
     standard output.  If no options are specified, all records are
     dumped in order from the first record of the first partition to
     the last record of the last partition.  Each field value is
     followed by a space, and each record is followed by a newline.
     Specific top-level fields may be dumped with the -field option.

OPTIONS:
     -field name    Dump the specified top-level field.  The default is
                    to dump all fields.  This option can occur multiple
                    times.  Each occurrence adds to the list of fields.

     -name          Precede each value by its field name and a colon.

     -n numrec      Limit the number of records dumped per partition.
                    The default is not to limit.

     -part N        Dump only the specified partition.  The default is
                    to dump all partitions.

     -p period      Dump every N'th record in a partition, starting
                    with the first record not skipped (see -skip).
                    The period must be greater than 0.  The default
                    is 1.

     -skip N        Skip the first N records in each partition.  The
                    default is 0.

     -x             Use the system config file rather than the one
                    stored in the dataset.

     If an option occurs multiple times, the last one takes effect.
     The -field option is an exception: each occurrence adds to the
     list of fields to be dumped.

EXAMPLES:
     Dump all records of all partitions of a parallel file named
     small.ds.  Precede each value by its field name and a colon.

          orchadmin dump -name small.ds

     Dump the value of the customer field of the first 99 records
     of partition 0 of big.ds.

          orchadmin dump -part 0 -n 99 -field customer big.ds


COMMAND: describe | lp | lf | ls | ll  [ -options... ]  descriptor-files...

     Print a report about each of the specified parallel files.
     lp = describe -p; lf = describe -f;  ls = describe -s;  ll = describe -l

OPTIONS:
     -p             List partitioning information (except for datafile info).

     -c             Print the stored config file, if any.

     -f             List the data files.

     -s             Print the schema.

     -x             Use the system config file rather than the one
                    stored in the dataset.

     -e             Describe segments individually.

     -v             Describe all segments, valid or otherwise

     -d             Print numbers exactly, not in pretty form

     -l             Means -p -f -s -e -v -c.

EXAMPLE:
     List the partitioning info, data files and schema of file1 and file2.

          orchadmin ll file1 file2


COMMAND: diskinfo [ -a | -np nodepool | -n node... ] diskpool

     Print a report about the specified disk pool

OPTIONS:
     -a             Print information for all nodes

     -np            Print information for just the specified node pool

     -n             Print information for the specified nodes

     -q             Print summary of information only

     If no options are supplied, the default node pool is used.

EXAMPLE:
     Describe disk pool pool1 in node pool bignodes

          orchadmin diskinfo -np bignodes pool1


COMMAND: check

Check the configuration file for any problems.  This command has
no options.

$

Bolman · Post by **Bolman** » Mon Oct 30, 2006 4:35 pm

Great, this works thanks!

So now I have the schema for the dataset file (that was created on the fly using RCP), I wish to feed this file into a readable sequential file.

I have read in Orch Manual that I can go sequential-file>dataset using this command:
$ osh "import -file inFile0.data -file inFile1.data -schema $example1_schema > outDS.ds "

But can we go dataset>sequential-file ?

Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 30, 2006 8:43 pm

It should be an export operator. Get a copy of the Orchestrate Operators manual.

Bolman · Post by **Bolman** » Mon Oct 30, 2006 11:35 pm

I have the manual and have applied your recomendation to try export rather than import, but still can't get it working.

Have tried pointing my osh "export..." to both my dataset header file and actual dataset files, have tried different combinations of properties in the command line without success.

This is my basic osh command line: osh "export -fileset 'DataSet.ds' -schemafile 'SchemaFile.sch' > OutputFile.dat "

The latest message I get is " In file set "DataSet.ds": Parsing a dataset, expecting a fileset.. "

The Osh manual says that you can do this with a dataset, but then the osh command references a fileset... confused!!

DSXchange

Dataset Schemas

Dataset Schemas

Re: Dataset Schemas