one record for each value which are seperated by delimitter

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

one record for each value which are seperated by delimitter

Post by RAJARP »

Hi all,
I am having data in the following format.

EMP_ID| EMP_ROLE| EMP_SKILLS|
1| PM|c;c++;UNIX|
2|SPM|Java;PHP|
3|GPM|PERL|

I want the data in the following format

EMP_ID| EMP_ROLE|EMP_SKILLS|
1|PM |c|
1|PM| c++|
1|PM|UNIX|
2|SPM|Java|
2|SPM| PHP|
3|GPM|PERL|

That is, i want one record for each value in the last columns which are separated by semicolons.
Your suggestions would be of great help.

Thanks in advance,
Regards,
Raja R P
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

if EMP_SKILLS is last column then, you can do the pivoting in Shell script.
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

Thanks keshav,for your reply.
EMP_SKILLS is not last column and can you explain about pivoting in shell script as am not aware of it.


Note:The data in EMP_SKILLS won't be consistent.(i.e.) It may have any number of values seperated by semicolon or even nulls
Plagvreugd
Participant
Posts: 9
Joined: Tue Sep 20, 2005 2:10 am

Post by Plagvreugd »

If the number of 'columns' in the field EMP_SKILLS is not very high you can achieve your goal within a normal Transform-stage by using the field-function.

- Make as many outpulinks as the maximum number of 'columns' you need to extract from the field EMP_FIELDS.
- Output the necessary columns and the first Field of EMP_SKILLS (the part before the first semicolon) to the first output-link
- Output the necessary columns and the second Field of EMP_SKILLS (the part between the first and the second semicolon) to the second output-link with the constraint that this second field is not empty (or that the number of semicolons >= 1)
etc.


After the transformer you can use a Funnel-stage to merge all the results from the several output-links.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a Column Import stage to parse the records into separate columns that can be pivoted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

Hi all,
Thanks for all your suggestions.I have found one more way which is easy for my case.

Source file--->Temp table(oracle enterprise stage)===>job 1

Temp Table(oracle enterprise stage)--->target file==>job 2


In the job 2 oracle enterprise stage i have used 'CONNECT BY LEVEL' oracle concept to make the data in the desired format.

Best regards,
Raja R P
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

Hi,
The solution which i have got using 'CONNECT BY LEVEL' was working fine when i executed the job with sample records(30 records totally).
But yesterday i tried running the job with full load (170,000 records).In toad the query which i have written is taking only 3 minutes and 58 secs to run.But the same query was running for 4 hrs in an oracle Enterprise stage without any improvement( i had to abort the parallel job).Please find my query which am using

Code: Select all

SELECT
    CU_ID,
    BSNS_FUNC_CD,
    CLASS_CODE,
    CHARACTERISTIC_NAME,
    TRIM(
        SUBSTR(CHAR_VAL,INSTR(CHAR_VAL,';',1,LEVEL) +1,INSTR(CHAR_VAL,';', 1,LEVEL+1)-INSTR(CHAR_VAL,';',1,LEVEL) -1)
        ) AS CHARACTERISTIC_VALUE,    
    MASTER_CLASS_FLAG,
    CHAR_VAL_ORIG,
    LEVEL
FROM
   (
    SELECT
        TRIM(CU_ID)                                               AS CU_ID,
        TRIM(BSNS_FUNC_CD)                             AS BSNS_FUNC_CD,
        TRIM(CLASS_CODE)                                  AS CLASS_CODE,
        TRIM(CHARACTERISTIC_NAME)              AS CHARACTERISTIC_NAME,
        ';'||TRIM(CHARACTERISTIC_VALUE)||';'     AS CHAR_VAL,
        TRIM(MASTER_CLASS_FLAG)                  AS MASTER_CLASS_FLAG,
        TRIM(CHARACTERISTIC_VALUE)            AS CHAR_VAL_ORIG
    FROM DEV.TABLE_CUID
   )
CONNECT BY LEVEL <= LENGTH(CHAR_VAL)-LENGTH(REPLACE(CHAR_VAL,';',''))-1
I want to know why a query which istaking 4 minutes in toad is taking more than 4 hrs in datastage's oracle enterprise stage.
Please somebody throw some light on this.

Regards,
Raja R P
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Toad <> DataStage. While a query is a query they execute in quite different environments between the two tools. Toad is all about returning you the first few rows while OE will need to process all. What you should be worried about is the explain plan which is not something anyone here can check for you. Make sure the query is as optimal as it can be based on your table structures, volumes, stats, etc. If you need some help with that, involve your DBA.
-craig

"You can never have too many knives" -- Logan Nine Fingers
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

Chulet,
Thanks for your respnose.

regards,
Raja R P
Post Reply