Wednesday, 28 August 2013

The variable return null all the time

The variable return null all the time

My program is a webcrawler. Im trying to download images from a website.
In my webcrawler site i did:
try
{
HtmlAgilityPack.HtmlDocument doc =
TimeOut.getHtmlDocumentWebClient(mainUrl, false, "", 0, "", "");
if (doc == null)
{
if (wccfg.downloadcontent == true)
{
retwebcontent.retrieveImages(mainUrl);
}
failed = true;
wccfg.failedUrls++;
failed = false;
}
For example when doc is null the mainUrl contain:
http://members.tripod.com/~VanessaWest/bundybowman2.jpg
Now its jumping to the retrieveImages method in the other class:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using HtmlAgilityPack;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Net;
using System.Web;
using System.Threading;
using DannyGeneral;
using GatherLinks;
namespace GatherLinks
{
class RetrieveWebContent
{
HtmlAgilityPack.HtmlDocument doc;
string imgg;
int images;
public RetrieveWebContent()
{
images = 0;
}
public List<string> retrieveImages(string address)
{
try
{
doc = new HtmlAgilityPack.HtmlDocument();
System.Net.WebClient wc = new System.Net.WebClient();
List<string> imgList = new List<string>();
doc.Load(wc.OpenRead(address));
HtmlNodeCollection imgs =
doc.DocumentNode.SelectNodes("//img[@src]");
if (imgs == null) return new List<string>();
foreach (HtmlNode img in imgs)
{
if (img.Attributes["src"] == null)
continue;
HtmlAttribute src = img.Attributes["src"];
imgList.Add(src.Value);
if (src.Value.StartsWith("http") ||
src.Value.StartsWith("https") ||
src.Value.StartsWith("www"))
{
images++;
string[] arr = src.Value.Split('/');
imgg = arr[arr.Length - 1];
//imgg = Path.GetFileName(new
Uri(src.Value).LocalPath);
//wc.DownloadFile(src.Value, @"d:\MyImages\" + imgg);
wc.DownloadFile(src.Value, "d:\\MyImages\\" +
Guid.NewGuid() + ".jpg");
}
}
return imgList;
}
catch
{
Logger.Write("There Was Problem Downloading The Image: " +
imgg);
return null;
}
}
}
}
Now im using a breakpoint and step line by line and after doing this line:
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
The variable imgs is null. Then on the next line that check if its null
its jumping to the end and does nothing.
How can i solve it so it will be able to download the image from
http://members.tripod.com/~VanessaWest/bundybowman2.jpg ?

No comments:

Post a Comment