one record for each value which are seperated by delimitter

RAJARP · Post by **RAJARP** » Sun Aug 09, 2009 1:59 am

Hi all,
I am having data in the following format.

EMP_ID| EMP_ROLE| EMP_SKILLS|
1| PM|c;c++;UNIX|
2|SPM|Java;PHP|
3|GPM|PERL|

I want the data in the following format

EMP_ID| EMP_ROLE|EMP_SKILLS|
1|PM |c|
1|PM| c++|
1|PM|UNIX|
2|SPM|Java|
2|SPM| PHP|
3|GPM|PERL|

That is, i want one record for each value in the last columns which are separated by semicolons.
Your suggestions would be of great help.

Thanks in advance,
Regards,
Raja R P

keshav0307 · Post by **keshav0307** » Sun Aug 09, 2009 3:13 am

if EMP_SKILLS is last column then, you can do the pivoting in Shell script.

RAJARP · Post by **RAJARP** » Sun Aug 09, 2009 3:55 am

Thanks keshav,for your reply.
EMP_SKILLS is not last column and can you explain about pivoting in shell script as am not aware of it.

Note:The data in EMP_SKILLS won't be consistent.(i.e.) It may have any number of values seperated by semicolon or even nulls

Plagvreugd · Post by **Plagvreugd** » Sun Aug 09, 2009 12:21 pm

If the number of 'columns' in the field EMP_SKILLS is not very high you can achieve your goal within a normal Transform-stage by using the field-function.

- Make as many outpulinks as the maximum number of 'columns' you need to extract from the field EMP_FIELDS.
- Output the necessary columns and the first Field of EMP_SKILLS (the part before the first semicolon) to the first output-link
- Output the necessary columns and the second Field of EMP_SKILLS (the part between the first and the second semicolon) to the second output-link with the constraint that this second field is not empty (or that the number of semicolons >= 1)
etc.

After the transformer you can use a Funnel-stage to merge all the results from the several output-links.

ray.wurlod · Post by **ray.wurlod** » Sun Aug 09, 2009 5:21 pm

Use a Column Import stage to parse the records into separate columns that can be pivoted.

RAJARP · Post by **RAJARP** » Tue Aug 11, 2009 2:50 am

Hi all,
Thanks for all your suggestions.I have found one more way which is easy for my case.

Source file--->Temp table(oracle enterprise stage)===>job 1

Temp Table(oracle enterprise stage)--->target file==>job 2

In the job 2 oracle enterprise stage i have used 'CONNECT BY LEVEL' oracle concept to make the data in the desired format.

Best regards,
Raja R P

RAJARP · Post by **RAJARP** » Sat Aug 22, 2009 7:02 am

Hi,
The solution which i have got using 'CONNECT BY LEVEL' was working fine when i executed the job with sample records(30 records totally).
But yesterday i tried running the job with full load (170,000 records).In toad the query which i have written is taking only 3 minutes and 58 secs to run.But the same query was running for 4 hrs in an oracle Enterprise stage without any improvement( i had to abort the parallel job).Please find my query which am using

Code: Select all

SELECT
    CU_ID,
    BSNS_FUNC_CD,
    CLASS_CODE,
    CHARACTERISTIC_NAME,
    TRIM(
        SUBSTR(CHAR_VAL,INSTR(CHAR_VAL,';',1,LEVEL) +1,INSTR(CHAR_VAL,';', 1,LEVEL+1)-INSTR(CHAR_VAL,';',1,LEVEL) -1)
        ) AS CHARACTERISTIC_VALUE,    
    MASTER_CLASS_FLAG,
    CHAR_VAL_ORIG,
    LEVEL
FROM
   (
    SELECT
        TRIM(CU_ID)                                               AS CU_ID,
        TRIM(BSNS_FUNC_CD)                             AS BSNS_FUNC_CD,
        TRIM(CLASS_CODE)                                  AS CLASS_CODE,
        TRIM(CHARACTERISTIC_NAME)              AS CHARACTERISTIC_NAME,
        ';'||TRIM(CHARACTERISTIC_VALUE)||';'     AS CHAR_VAL,
        TRIM(MASTER_CLASS_FLAG)                  AS MASTER_CLASS_FLAG,
        TRIM(CHARACTERISTIC_VALUE)            AS CHAR_VAL_ORIG
    FROM DEV.TABLE_CUID
   )
CONNECT BY LEVEL <= LENGTH(CHAR_VAL)-LENGTH(REPLACE(CHAR_VAL,';',''))-1

I want to know why a query which istaking 4 minutes in toad is taking more than 4 hrs in datastage's oracle enterprise stage.
Please somebody throw some light on this.

Regards,
Raja R P

chulett · Post by **chulett** » Sat Aug 22, 2009 7:10 am

Toad <> DataStage. While a query is a query they execute in quite different environments between the two tools. Toad is all about returning you the first few rows while OE will need to process all. What you should be worried about is the explain plan which is not something anyone here can check for you. Make sure the query is as optimal as it can be based on your table structures, volumes, stats, etc. If you need some help with that, involve your DBA.

RAJARP · Post by **RAJARP** » Sat Aug 22, 2009 8:09 am

Chulet,
Thanks for your respnose.

regards,
Raja R P