« | »

Twitter Search parser in PHP

Twitter

I’m feeling a bit uncomfortable right now, like I’m about to change clothes in a public dressing room whose walls don’t go all the way to the floor. Other than the occasional unsettling dream, I’ve never really thought about what it would be like; taking that part of you which is always hidden and private and putting it on public display. There really shouldn’t be anything to fear right? I’ve seen others’ and mine appears similar to theirs. Maybe mine has some weird lumps or is a little misshapen. Ah hell, no one ever reads this blog anyway. Time to lift the hood and show you all my soft and squishy goodness! Here’s some PHP code I wrote (and partly frankensteined) to parse and display a twitter search.

<?php
$intervalNames = array('second', 'minute', 'hour', 'day', 'week', 'month',  'year');
$intervalSeconds = array( 1,        60,     3600,  86400, 604800, 2630880, 31570560);
// Tweets with these strings will be skipped - you may add to this list below
$badWords = array('poodles', 'carebears', 'beiber');
$doc = new DOMDocument();
$doc->load('http://search.twitter.com/search.atom?q=poliorketika');
foreach ($doc->getElementsByTagName('entry') as $node) {
	$tweetInfo = array ( 
		'userlink' => $node->getElementsByTagName('uri')->item(0)->nodeValue,
		'username' => $node->getElementsByTagName('name')->item(0)->nodeValue,
		'content' => $node->getElementsByTagName('content')->item(0)->nodeValue,
		'image' => $node->getElementsByTagName('link')->item(1)->getAttribute('href'),
		'date' => $node->getElementsByTagName('published')->item(0)->nodeValue
	);
	$time = 'just now';
	$secondsPassed = time() - strtotime($tweetInfo['date']);
	if ( $secondsPassed > 0 ) {
	  for( $j = count($intervalSeconds)-1; ($j >= 0); $j--) {
		$crtIntervalName = $intervalNames[$j];
		$crtInterval = $intervalSeconds[$j];
		if ($secondsPassed >= $crtInterval) {
			$value = floor($secondsPassed / $crtInterval);
			if ($value > 1) {
				$crtIntervalName .= 's';
				$time = $value . ' ' . $crtIntervalName . ' ago';
				break;
			}
		}
	  }
	}
	$skipTweet = false;
	foreach ( $badWords as $badWord ) {
		if ( stristr ( $tweetInfo['content'], $badWord ) !== false ) {
			$skipTweet = true;
			break;
		}
	}
	if ( $skipTweet == false ) {
		echo '<div style="width:580px; float:left; border:1px solid #dcdcdc; padding:6px; margin:4px">';
		echo '<div style="width:54px; float:left">';
		echo '<a href="', $tweetInfo['userlink'], '" target="_top">';
		echo '<img src="', $tweetInfo['image'], '" alt="', $tweetInfo['user'], '" style="width:48px; height:48px" />';
		echo '</a>';
		echo '</div>';
		echo '<div style="width:500px; float:left">';
		echo '<a href="', $tweetInfo['userlink'], '" target="_top">';
		echo $tweetInfo['username'], '</a>: ', $tweetInfo['content'];
		echo '<br /><font style="color:#cdcdcd">posted ', $time, '</font>';
		echo '</div>';
		echo '</div>';
	}
}
?>

There, that wasn’t so bad was it? It must not have been too disgusting since you haven’t run back home to Google yet. Here’s a brief rundown of how it works. First we set some constants for parsing the time interval and set an array of bad words. We don’t want to show tweets containing these words on our blog do we? Modify that array to hold whatever words you find offensive. Line 7 loads the Atom XML twitter search results. You can learn all about the different kinds of twitter searches in the Twitter API documentation or you can simply replace the string “poliorketika” with whatever you want to search for.

At line 8 we start our foreach loop through the twitter search results and then start pulling the variables we’ll need to show the tweets: userlink, username, content, image, and date. From lines 16 – 31 we’re calculating how much time has passed since the tweet was tweeted by the twitterer (don’t be frightened by my technical terminologies). Lines 32 – 38 check to see if the tweet contains any bad words. Actually I just realized I should probably move lines 16 – 31 inside the if statement at line 39 because there is no point in calculating the time of the tweet if the tweet contains bad words and isn’t going to show in our list. I’m not perfect dammit! Ah well, I’ll worry about my not being perfect some other time.

Lines 39 – 52 is where we show the tweets. And yes, I’m well aware that I should put classes on the <div> tags and put all the style properties in the style sheet but this is easier for blogging purposes. You can edit the styles here to display the tweets however you wish. This method only returns the 15 most recent tweets. And there are so many other methods for doing this. Using AJAX would be nice since you could set it up to automatically load the new tweets without having to refresh the page, but this approach was suitable for my purposes. If you wish to complain about anything, feel free. There is a complaint form at the bottom of this post. Happy tweet harvesting!

This entry was posted on Wednesday, July 28th, 2010 at 11:35 am and is filed under PHP Scripts. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to Twitter Search parser in PHP

  • John C says: May 17, 2011 at 10:59 am

    Great solution. I am trying to count the number of tweets. Counting the array $tweetInfo only returns 5, which is the number of elements in the array. How could I calculate the number of tweets returned by the query, i.e.:http://search.twitter.com/search.atom?q=poliorketika Thanks!

  • Πολιορκητικα says: May 17, 2011 at 11:24 am

    I wrote this so long ago I can't remember where this code lives to test any potential solutions. However, the content you get back from twitter is just XML with each tweet result in an "entry" node. You should be able to count how many "entry" nodes there are. Something like this before the foreach at line 8:

    $tweets = $doc->getElementsByTagName("entry");
    $tweetcount = $tweets->length;

    The variable $tweetcount should contain the number of "entry" nodes.

    Hope this helps. If not, let me know and I can try to track down my original code and find a working solution.

  • Πολιορκητικα says: May 17, 2011 at 11:58 am

    I was able to test this and it does work. However, keep in the mind the default number of tweets returned per page is 15 or so it would seem. If you would like to return more than 15 per page add "rpp=#" to your query as follows:
    http://search.twitter.com/search.atom?q=enmity&rpp=100

    This will return the 100 most recent tweets. To get the next 100 matching tweets add "page=2" to the query:
    http://search.twitter.com/search.atom?q=enmity&rpp=100&page=2

    For more information on the Twitter search API look here:
    http://dev.twitter.com/doc/get/search



  • Leave a Reply