Yahoo Domain Result Grabber
I released my PHP Google Grabber script about a month ago and it was a big hit, even spawning Python and Groovy versions. Obtaining the number of pages indexed in Google by simply providing a domain name (or multiple, if you loop the function) can save you a lot of time. I run this script on a monthly basis to keep track of my customers' websites -- many of them use CMS' we've built so I get to take a peak at how they're doing SEO-wise.
Although Yahoo! isn't nearly as relevant as Google in the search department, Yahoo! is still the most visited website on the internet. Since I already had the basic framework of the code built (from my Google Grabber), I thought it might be beneficial to take a few moments to Yahoo!ize it.
The PHP Code
/* return result number */
function get_yahoo_results($domain = 'davidwalsh.name')
{
// get the result content
$content = file_get_contents('http://siteexplorer.search.yahoo.com/search?p=site:http://'.$domain);
// parse to get results
$pages = str_replace(array(' ',')','('),'',get_match('/Pages (.*) |/isU',$content));
$inlinks = str_replace(array(' ',')','('),'',get_match('/Inlinks (.*) /isU',$content));
$return['pages'] = $pages ? $pages : 0;
$return['inlinks'] = $inlinks? $inlinks : 0;
// return result
return $return;
}
/* helper: does the regex */
function get_match($regex,$content)
{
preg_match($regex,$content,$matches);
return $matches[1];
}The Usage
domains = array('davidwalsh.name','digg.com','yahoo.com','cnn.com','dzone.com','some-domain-that-doesnt-exist.com');
foreach($domains as $domain)
{
$result = get_yahoo_results($domain);
echo $domain,': ',$result['pages'],' pages, ',$result['inlinks'],' inlinks';
}
//davidwalsh.name: 204 pages, 518 inlinks
//digg.com: 20,700,000 pages, 14,300,000 inlinks
//yahoo.com: 1,290,000,000 pages, 4,650,000 inlinks
//cnn.com: 7,510,000 pages, 1,090,000 inlinks
//dzone.com: 776,000 pages, 15,000 inlinks
//some-domain-that-doesnt-exist.com: 0 pages, 0 inlinksMuch like my Google Grabber, you may need to adjust the method of connecting to Yahoo! based on your hosting environment. CURL may be the best option for you.
- Login or register to post comments
- 707 reads
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)









