Wednesday, 18 September 2013

Group values with common domain and page values

Group values with common domain and page values

Based on a follow-up from a previous question Parsing URI parameter and
keyword value pairs, I would like to group URLs that have the same domain
and page name. The URLs may have the same or different parameters and/or
respective values. The URL/page value is printed, followed by all of it
parameter and keyword values. Looking for an answer using Python to parse,
group and print the values. I have not been able to find an answer via
Google or SO.
Example source of URLs with various parameters and values:
www.domain.com/page?id_eve=479989&adm=no
www.domain.com/page?id_eve=47&adm=yes
www.domain.com/page?id_eve=479
domain.com/cal?view=month
domain.com/cal?view=day
ww2.domain.com/cal?date=2007-04-14
ww2.domain.com/cal?date=2007-08-19
www.domain.edu/some/folder/image.php?l=adm&y=5&id=2&page=http%3A//support.domain.com/downloads/index.asp&unique=12345
blog.news.org/news/calendar.php?view=day&date=2011-12-10
www.domain.edu/some/folder/image.php?l=adm&y=5&id=2&page=http%3A//.domain.com/downloads/index.asp&unique=12345
blog.news.org/news/calendar.php?view=month&date=2011-12-10
Example output I am looking for. The URL and a list of the parameter/value
combinations from all of the URLs that are the same is the original.
www.domain.com/page
id_eve=479989
id_eve=47
id_eve=479
adm=no
adm=yes

domain.com/cal
view=month
view=day

w2.domain.com/cal
date=2007-04-14
date=2007-08-19

www.domain.edu/some/folder/image.php
l=adm
l-adm
id=2
id=2
page=http%3A//.domain.com/downloads/index.asp
page=http%3A//support.domain.com/downloads/index.asp

No comments:

Post a Comment