<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>cpierce.org &#187; Centos Linux</title>
	<atom:link href="http://www.cpierce.org/category/os/centoslinux/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cpierce.org</link>
	<description>Chris Lee Pierce</description>
	<lastBuildDate>Fri, 13 Jan 2012 03:36:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Aggregating RSS Feeds</title>
		<link>http://www.cpierce.org/2009/01/aggregating-rss-feeds/</link>
		<comments>http://www.cpierce.org/2009/01/aggregating-rss-feeds/#comments</comments>
		<pubDate>Tue, 06 Jan 2009 01:31:36 +0000</pubDate>
		<dc:creator>cpierce</dc:creator>
				<category><![CDATA[Bash]]></category>
		<category><![CDATA[Centos Linux]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[OS]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ubuntu Linux]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[crontab]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://www.cpierce.org/?p=65</guid>
		<description><![CDATA[Pull from several RSS feeds on a high traffic site for too long and you&#8217;ll wonder if there is a better way. Fortunately for you there is. Aggregating your RSS feeds solves several problems for both you and the source of the RSS. First it reduces the bandwidth required from both the source site and [...]]]></description>
			<content:encoded><![CDATA[<p>Pull from several RSS feeds on a high traffic site for too long and you&#8217;ll wonder if there is a better way.  Fortunately for you there is.  Aggregating your RSS feeds solves several problems for both you and the source of the RSS.  First it reduces the bandwidth required from both the source site and your site.  Imagine a site that gets several requests per hour.  Now imagine this site pulling from another site via RSS every time that a client loads the page.  The result is the same data getting pulled over and over again.  There is a better way!<br />
<span id="more-65"></span><br />
Aggregated RSS software is available for a variety of operating systems and languages.  The problem is that many of these have rather large footprints and cause for extra strain to be put on already busy servers.  If you host with Linux you already have the tools required to do aggregation.  Here are the things you will need:</p>
<ul>
<li>Access to crontab</li>
<li>wget installed on your server</li>
<li>PHP</li>
</ul>
<p>First lets look at the command that makes this all possible and go into a little detail about how it works.  RSS feeds are XML based pages served for the most part by HTML browsers.  A sample of RSS can be seen below:</p>
<div class="codecolorer-container xml default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;title<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Ice Rink Shiner<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/title<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;link<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://www.cpierce.org/2009/01/ice-rink-shiner/<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/link<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;comments<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://www.cpierce.org/2009/01/ice-rink-shiner/#comments<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/comments<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;pubDate<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Mon, 05 Jan 2009 21:20:11 +0000<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/pubDate<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;dc:creator<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>admin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/dc:creator<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></div>
<p>This is a excerpt from my RSS feed here at <a href="http://cpierce.org/feed">http://cpierce.org/feed</a>.  If we were to simply want to pull this one feed to our server we could use wget as follows:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #666666; font-style: italic;">#!/bin/bash</span><br />
<span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span><span style="color: #c20cb9; font-weight: bold;">wget</span> <span style="color: #660033;">--tries</span>=<span style="color: #000000;">2</span> <span style="color: #660033;">--dns-timeout</span>=<span style="color: #000000;">5</span> <span style="color: #660033;">--connect-timeout</span>=<span style="color: #000000;">5</span> <span style="color: #660033;">--no-check-certificate</span> <span style="color: #ff0000;">&quot;http://www.cpierce.org/feed/&quot;</span> <span style="color: #660033;">-O</span> <span style="color: #000000; font-weight: bold;">/</span>var<span style="color: #000000; font-weight: bold;">/</span>www<span style="color: #000000; font-weight: bold;">/</span>html<span style="color: #000000; font-weight: bold;">/</span>cpierce.org.xml</div></div>
<p>Now that the RSS feed is on our own server we don&#8217;t have to rely on the speed of the source during page loads.  We can also still provide user content even if the source host is down.  We could simply run the bash script above every time we wanted to pull a new copy of the feed, but we are looking for a more automated way of doing this.  Lets start by upgrading our bash script to PHP so that we can easily pull multiple RSS feeds at once.  Here is the example /var/www/html/rss/rss_feed.php code:</p>
<div class="codecolorer-container php default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #666666; font-style: italic;">// we start with a simple function that allows us to run command line scripts from php</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000; font-weight: bold;">function</span> syscmd<span style="color: #009900;">&#40;</span><span style="color: #000088;">$cmd</span><span style="color: #339933;">,</span> <span style="color: #000088;">$output</span><span style="color: #339933;">=</span><span style="color: #009900; font-weight: bold;">false</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$prun</span><span style="color: #339933;">=</span><a href="http://www.php.net/popen"><span style="color: #990000;">popen</span></a><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;(<span style="color: #006699; font-weight: bold;">$cmd</span>)2&gt;&amp;1&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;r&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">126</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span><a href="http://www.php.net/feof"><span style="color: #990000;">feof</span></a><span style="color: #009900;">&#40;</span><span style="color: #000088;">$prun</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #000088;">$buffer</span><span style="color: #339933;">=</span><a href="http://www.php.net/fgets"><span style="color: #990000;">fgets</span></a><span style="color: #009900;">&#40;</span><span style="color: #000088;">$prun</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">10000</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$output</span><span style="color: #009900;">&#41;</span> <span style="color: #b1b100;">print</span> <a href="http://www.php.net/nl2br"><span style="color: #990000;">nl2br</span></a><span style="color: #009900;">&#40;</span><span style="color: #000088;">$buffer</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #b1b100;">return</span> <a href="http://www.php.net/pclose"><span style="color: #990000;">pclose</span></a><span style="color: #009900;">&#40;</span><span style="color: #000088;">$prun</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">// we need a place to store these files we are going to be pulling (this path must be writable from your httpd</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000088;">$path</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'/var/www/html/rss/'</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">// now we need an array that will hold our file name (the key) and our rss feed url (the value)</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000088;">$feeds</span> <span style="color: #339933;">=</span> <a href="http://www.php.net/array"><span style="color: #990000;">array</span></a><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'cpierce.org'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'http://www.cpierce.org/feed'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0000ff;">'jbcrawford.net'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'http://www.jbcrawford.net/feed'</span><span style="color: #339933;">,</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0000ff;">'jstownsley.com'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'http://www.jstownsley.com/feed'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #666666; font-style: italic;">// now we need to loop through the array $feeds and pull each rss feed to our local $path.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$feeds</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$name</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; syscmd<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/usr/bin/wget --tries=2 --dns-timeout=5 --connect-timeout=5 --no-check-certificate &quot;'</span><span style="color: #339933;">.</span><span style="color: #000088;">$url</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&quot; -O '</span><span style="color: #339933;">.</span><span style="color: #000088;">$path</span><span style="color: #339933;">.</span><span style="color: #000088;">$name</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'.xml'</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
<p>We can test this by running it in our browser http://www.site.com/rss/rss_feed.php.  Note this is also handy to do if you need to manually refresh an rss feed before the scheduled time.  Once this is all working you&#8217;ll have xml files in your specified path.  Just one thing left to do, schedule a time for them to start using &#8216;crontab -e&#8217;:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">15</span> &nbsp;<span style="color: #000000;">0</span>-<span style="color: #000000;">23</span><span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">4</span> &nbsp; &nbsp; <span style="color: #000000; font-weight: bold;">*</span> &nbsp; <span style="color: #000000; font-weight: bold;">*</span> &nbsp; <span style="color: #000000; font-weight: bold;">*</span> &nbsp; <span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span><span style="color: #c20cb9; font-weight: bold;">wget</span> <span style="color: #660033;">--delete-after</span> http:<span style="color: #000000; font-weight: bold;">//</span>www.site.com<span style="color: #000000; font-weight: bold;">/</span>rss<span style="color: #000000; font-weight: bold;">/</span>rss_feed.php <span style="color: #000000; font-weight: bold;">&gt;/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null <span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">&gt;&amp;</span><span style="color: #000000;">1</span></div></div>
<p>This tells our system scheduled crontab to run every 4 hours when the minute hand is on the 15 (I do this so everything isn&#8217;t scheduled at the top of the hour).  If you need to add other rss feeds you simply add them to your array and then access them via http://www.site.com/rss/cpierce.org.xml.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cpierce.org/2009/01/aggregating-rss-feeds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

