Yahoo Domain Result Grabber

I released my PHP Google Grabber script about a month ago and it was a big hit, even spawning Python and Groovy versions. Obtaining the number of pages indexed in Google by simply providing a domain name (or multiple, if you loop the function) can save you a lot of time. I run this script on a monthly basis to keep track of my customers' websites -- many of them use CMS' we've built so I get to take a peak at how they're doing SEO-wise.

Although Yahoo! isn't nearly as relevant as Google in the search department, Yahoo! is still the most visited website on the internet. Since I already had the basic framework of the code built (from my Google Grabber), I thought it might be beneficial to take a few moments to Yahoo!ize it.

The PHP Code

/* return result number */ 
function get_yahoo_results($domain = 'davidwalsh.name') 
{ 
	// get the result content 
	$content = file_get_contents('http://siteexplorer.search.yahoo.com/search?p=site:http://'.$domain); 
 
	// parse to get results 
	$pages = str_replace(array(' ',')','('),'',get_match('/Pages (.*) |/isU',$content)); 
	$inlinks = str_replace(array(' ',')','('),'',get_match('/Inlinks (.*) /isU',$content)); 
 
	$return['pages'] = $pages ? $pages : 0; 
	$return['inlinks'] = $inlinks? $inlinks : 0; 
 
	// return result 
	return $return; 
} 
 
/* helper: does the regex */ 
function get_match($regex,$content) 
{ 
	preg_match($regex,$content,$matches); 
	return $matches[1]; 
}

The Usage

domains = array('davidwalsh.name','digg.com','yahoo.com','cnn.com','dzone.com','some-domain-that-doesnt-exist.com'); 
foreach($domains as $domain) 
{ 
	$result = get_yahoo_results($domain); 
	echo $domain,': ',$result['pages'],' pages, ',$result['inlinks'],' inlinks'; 
} 
 
//davidwalsh.name: 204 pages, 518 inlinks 
//digg.com: 20,700,000 pages, 14,300,000 inlinks 
//yahoo.com: 1,290,000,000 pages, 4,650,000 inlinks 
//cnn.com: 7,510,000 pages, 1,090,000 inlinks 
//dzone.com: 776,000 pages, 15,000 inlinks 
//some-domain-that-doesnt-exist.com: 0 pages, 0 inlinks

Much like my Google Grabber, you may need to adjust the method of connecting to Yahoo! based on your hosting environment. CURL may be the best option for you.

0

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)