Hi,
What I am trying to do is the equivalent of
select
event_session_id,
first(ref_qry_strng_text),
first(agent),
first(page_id),
first(start_dt_tm),
last(start_dt_tm),
first(timediff),
web_site_id,
first(clnt_ipa_num),
sum(page_cnt),
first(link_key),
sum(tot_file_size_num),
last(nav_path)
from
sessn_event
group by
web_site_id,
event_session_id
using a set of transformers and remove duplicate stages.
My input is made up of the following fields, in a dataset.
jsessionid
ref_qry_stng_text
agent
start_dt_tm
end_dt_tm
timediff
clnt_ipa_num
link_key
PAGE_ID
WEB_SITE_ID
page_cnt
SIZE_FILE_NUM
NAV_PATH
Giving the example of what I do in the job.
Input
-----------------------------------------------------------------------------------------------------------
"0005WAXIVD2YTQFIOERSFE4AVCZB4IV0",
"
https://www1.net.hsbc.com/code/pro/en_U ... stid=17494",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)",
"2005-09-22 15:26:37",
"0",
"-4",
"217.45.222.121",
"2",
"5236",
"38",
"0",
" 00000000000000000000.",
"5236"
--------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"-",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:53:51",
"2005-09-22 13:54:08",
"-4",
"172.201.177.223",
"1",
"5234",
"37",
"1",
" 00000000000000000000.",
"5234"
---------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"
https://www1.net.hsbc.com/code/public/e ... stid=16250",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:54:08",
"2005-09-22 13:54:09",
"-4",
"172.201.177.223",
"1",
"5234",
"37",
"0",
" 00000000000000000097.",
"5234-5234"
------------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"
https://www1.net.hsbc.com/code/public/e ... stid=16250",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:54:09",
"2005-09-22 13:54:14",
"-4",
"172.201.177.223",
"1740363",
"5266",
"30",
"1",
" 00000000000000000000.",
"5234-5234-5266"
----------------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"-",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:54:14",
"2005-09-22 13:54:14",
"-4",
"172.201.177.223",
"1740363",
"5267",
"30",
"0",
" 00000000000000000000.",
"5234-5234-5266-5267"
---------------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"-",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:54:14",
"2005-09-22 13:54:19",
"-4",
"172.201.177.223",
"1740363",
"5268",
"30",
"0",
" 00000000000000000000.",
"5234-5234-5266-5267-5268"
-----------------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"-",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 13:54:19",
"2005-09-22 14:12:52",
"-4",
"172.201.177.223",
"1740363",
"5268",
"30",
"0",
" 00000000000000000000.",
"5234-5234-5266-5267-5268-5268"
--------------------------------------------------------------------------------------------------------------
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0",
"-",
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",
"2005-09-22 14:12:52",
"0",
"-4",
"172.201.177.223",
"1740363",
"5277",
"30",
"0",
" 00000000000000000000.",
"5234-5234-5266-5267-5268-5268-5277"
------------------------------------------------------------------------------------------------------------------
"004CRI4FMNNH1QFIWESCFF4AVBZB4IV0",
"
https://www2.net.hsbc.com/code/pro/en_U ... ozilla/4.0 (compatible; MSIE 6.0; Windows 98)",
"2005-09-22 20:38:08",
"0",
"-4",
"154.20.4.123",
"2","5236",
"38",
"1",
" 00000000000000000000.",
"5236"
---------------------------------------------------------------------------------------------------------------------
So the output for the jsessionid "001SW1IDO3JTJQFIOERCFFWAVCZB4IV0" would be
"001SW1IDO3JTJQFIOERCFFWAVCZB4IV0", - group by jsessionid
"
https://www1.net.hsbc.com/code/public/e ... stid=16250", -first(ref_qry_strng_text)
"Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; .NET CLR 1.1.4322)",- first(agent)
"5234",-first(page_id)
"2005-09-22 13:53:51", - first(start_dt_tm)
"2005-09-22 14:12:52", - last(start_dt_tm)
"-4",-first(timediff)
"172.201.177.223",-first(clnt_ipa_num)
"2",-sum(page_cnt)
"30",-group by
"1",-first(link_key)
" 00000000000000000097.",- sum(tot_file_size_num)
"5234-5234-5266-5267-5268-5268-5277" - last(nav_path)
1. I sort the input data based on the web_site_id, jsessionid, then the start_dt_tm.
2. Later I split the records based the requirment of first and last, those which are first I pass it through a seperate link, those which are last I pass it
through a seperate link, and for those values which needs to be summed for a particular web_site_id, session_id, start_dt_tm are summed in the
transformer by comparing previous value and next value. For those values which are to be considered only as first the other values are set to zero(numeric),
and ''(varchar) values.
3. I send the values which are first through the remove duplicates and take up the first value.
4. I send the values which are last thorugh the remove duplicates and take up the last value.
5. I combine both the first and last values using a funnel.
6. Later since I want the grouped values based on web_site_id, jsessionid and start_dt_tm, For those values which are char, I concatenate first value with last value, and if numeric, I add the first value with next value.