Friday, 16 August 2013

VB.net link extraction using HtmlAgilityPack

VB.net link extraction using HtmlAgilityPack

My program does extract the links from any specified html but the problem
is that it won't display the complete URL.
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()
htmlDoc.LoadHtml(WebSource)
For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//cite")
If link.InnerText.Contains("index.php") Then
ListBox1.Items.Add(link.InnerText)
End If
Next
My expected output should be:
http://www.site1.com/index.php/test/sample_test
http://www.site2.com/index.php/aaaa/bbbbb/sssss
http://www.site3.com/index.php/zzzz_z/ssss_f
http://www.site4.com/index.php/teest/
http://www.site5.com/index.php/sample_url/test=1
but it displays a broken URL with dots, for example this URL:
http://www.site5.com/index.php/sample_url/test=1
the actual output looks like this:
http://www.site5.com/index.php...test..1
What seems to be causing this? I am really confused.

No comments:

Post a Comment