<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MrBrown blob &#187; awk</title>
	<atom:link href="http://charles.lescampeurs.org/tag/awk/feed" rel="self" type="application/rss+xml" />
	<link>http://charles.lescampeurs.org</link>
	<description>random bits.</description>
	<lastBuildDate>Sat, 10 Apr 2010 09:02:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Finding the total size of a set of files with awk</title>
		<link>http://charles.lescampeurs.org/2009/04/17/finding-the-total-size-of-a-set-of-files-with-awk</link>
		<comments>http://charles.lescampeurs.org/2009/04/17/finding-the-total-size-of-a-set-of-files-with-awk#comments</comments>
		<pubDate>Fri, 17 Apr 2009 09:12:22 +0000</pubDate>
		<dc:creator>CharlyBr</dc:creator>
				<category><![CDATA[Command line]]></category>
		<category><![CDATA[awk]]></category>

		<guid isPermaLink="false">http://charles.lescampeurs.org/?p=185</guid>
		<description><![CDATA[If you need to sum the total size of files in a directory or matching a pattern, an easy solution is to use awk. I needed to calculate this total for a set of javascript files, I used this command line: $ find App/ -name '*.js' -exec ls -l \{\} \; &#124; awk '{sum+=$5} END [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2009%2F04%2F17%2Ffinding-the-total-size-of-a-set-of-files-with-awk"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2009%2F04%2F17%2Ffinding-the-total-size-of-a-set-of-files-with-awk&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>If you need to sum the total size of files in a directory or matching a pattern, an easy solution is to use <em>awk</em>.</p>
<p>I needed to calculate this total for a set of javascript files, I used this command line:</p>
<pre>$ find App/ -name '*.js' -exec ls -l \{\} \; | awk '{sum+=$5} END {print sum}'
1929403</pre>
<p>For a human readable result, you can divide your result and use printf to format it:</p>
<pre>$ find App/ -name '*.js' -exec ls -l \{\} \; | awk '{sum+=$5} END {printf("%.2fM\n", sum/1024/1024)}'
1.84M</pre>
]]></content:encoded>
			<wfw:commentRss>http://charles.lescampeurs.org/2009/04/17/finding-the-total-size-of-a-set-of-files-with-awk/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Memory usage by group of processes</title>
		<link>http://charles.lescampeurs.org/2009/03/13/memory-usage-by-group-of-processes</link>
		<comments>http://charles.lescampeurs.org/2009/03/13/memory-usage-by-group-of-processes#comments</comments>
		<pubDate>Fri, 13 Mar 2009 14:14:26 +0000</pubDate>
		<dc:creator>CharlyBr</dc:creator>
				<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Command line]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[shell]]></category>

		<guid isPermaLink="false">http://charles.lescampeurs.org/?p=173</guid>
		<description><![CDATA[While monitoring a http/php server, I needed to do some statistics about php-cgi memory usage. Playing with memory_limit in PHP, we wanted to know the average memory usage per php-cgi process. This is easily calculated with our best friend awk. First, get the number of php running processes: # ps aux &#124; grep php-cgi &#124; [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2009%2F03%2F13%2Fmemory-usage-by-group-of-processes"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2009%2F03%2F13%2Fmemory-usage-by-group-of-processes&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>While monitoring a http/php server, I needed to do some statistics about php-cgi memory usage.</p>
<p>Playing with <em>memory_limit</em> in PHP, we wanted to know the average memory usage per php-cgi process. This is easily calculated with our best friend awk.</p>
<p>First, get the number of php running processes:</p>
<pre># ps aux | grep php-cgi | grep -v grep | wc -l
126</pre>
<p>Then, use awk to calculate the average memory usage for these processes:</p>
<pre># ps aux | grep --exclude=grep php-cgi | grep -v grep | awk 'BEGIN{s=0;}{s=s+$6;}END{print s/126;}'
33987.8</pre>
<p>The number used in the calculation is the field RSS given by ps. The ps manual page says:</p>
<blockquote><p>rss: resident set size, the non-swapped physical memory that a task has used (in kiloBytes)</p></blockquote>
<p>You can also calculate the total memory used by all php-cgi processes:</p>
<pre># ps aux | grep --exclude=grep php-cgi | grep -v grep | awk 'BEGIN{s=0;}{s=s+$6;}END{print s;}'
4302028</pre>
<p>If you need to watch the trend of this average memory usage, a little shell loop does the trick:</p>
<pre># while [ 1 ]; do ps aux | grep --exclude=grep php-cgi | grep -v grep | awk 'BEGIN{s=0;}{s=s+$6;}END{print s/126;}'; sleep 2; done
34401.3
34405.1
34408.4
34409.4
34414.2
34417</pre>
]]></content:encoded>
			<wfw:commentRss>http://charles.lescampeurs.org/2009/03/13/memory-usage-by-group-of-processes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>About user agent strings</title>
		<link>http://charles.lescampeurs.org/2008/09/22/about-user-agent-strings</link>
		<comments>http://charles.lescampeurs.org/2008/09/22/about-user-agent-strings#comments</comments>
		<pubDate>Mon, 22 Sep 2008 05:48:12 +0000</pubDate>
		<dc:creator>CharlyBr</dc:creator>
				<category><![CDATA[Logs]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[user agent]]></category>

		<guid isPermaLink="false">http://charles.lescampeurs.org/?p=73</guid>
		<description><![CDATA[I was surprised when I saw the length of the Chrome user agent string last week: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13 And in our logs: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13 a user agent string of 119 characters. It looks [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2008%2F09%2F22%2Fabout-user-agent-strings"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fcharles.lescampeurs.org%2F2008%2F09%2F22%2Fabout-user-agent-strings&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>I was surprised when I saw the length of the Chrome user agent string last week:</p>
<pre>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13</pre>
<p>And in our logs:</p>
<pre>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13</pre>
<p>a user agent string of 119 characters. It looks quite a waste of space but is Google Chrome the only one? Surprisingly, Chrome is far from the worst.</p>
<p>Best of one of our log file:</p>
<ul>
<li>641 characters: <em>Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 GoogleToolbarFF 3.0.20070420 GoogleToolbarFF 3.0.20070420 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525</em></li>
<li>337 characters: <em>Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; DA4BB049-ADVLOVER|0001|DSL; C:\DOCUME~1\everey\CONFIG~1\Temp\; C:\DOCUME~1\zulcan\CONFIG~1\Temp\; C:\DOCUME~1\nilfer\CONFIG~1\Temp\; C:\DOCUME~1\mirmor\CONFIG~1\Temp\; C:\DOCUME~1\ASTNU~1\CONFIG~1\Temp\; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727)</em></li>
<li>290 characters: <em>Mozilla/5.0 (Windows; U LupinV2.u2/20080827 LupinV2.u2/20080828 LupinV2.u2/20080829 LupinV2.u2/20080830 LupinV2.u2/20080831 LupinV2.u2/20080902 LupinV2.u2/20080903 LupinV2.u2/20080909 LupinV2.u2/20080911 LupinV2.u2/20080912; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1</em></li>
<li>272 characters: <em>Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SU 3.011; User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://bsalsa.com) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30428; .NET CLR 3.0.30422)</em></li>
<li>202 characters: <em>Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; IE7-01NET.COM-1.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2; IE7-01NET.COM-1.1)</em></li>
</ul>
<h2>The full list</h2>
<ul>
<li>Download the <a href="http://charles.lescampeurs.org/wp-content/uploads/2008/09/agentlog.gz">user agent log</a> (gzip format)</li>
</ul>
<h2>How to extract user agent strings from a HTTP log file?</h2>
<ul>
<li>Print user agent strings with its length:</li>
</ul>
<pre>awk -F\" '{print length($6)" "$6}'  access.log</pre>
<ul>
<li>print user agent strings that are more than 200 characters length:</li>
</ul>
<pre>awk -F\" '{if ($6 &gt; 200) print length($6)" "$6}'  access.log</pre>
<p>In those examples, the access.log file has this log format:</p>
<pre>xxx.xxx.xxx.xxx \
www.domain.com - \
[15/Sep/2008:00:00:00 +0200] \
"GET / HTTP/1.1" 200 4242 \
"http://www.domain.com/" \
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"</pre>
<h2>About bandwith</h2>
<p>If you take an average user agent string likes the Firefox one, you have a 91 charaters string.</p>
<ul>
<li>How many entries with a user agent string longer than 120 characters: 249586</li>
</ul>
<pre>awk -F\" '{if (length($6) &gt; 120) print length($6)}' access.log | wc -l</pre>
<ul>
<li>Size waste with string longer than 120 characters: 5.67 M</li>
</ul>
<pre>awk -F\" '{if (length($6) &gt; 120) SUM += length($6)-120} END {print SUM/1024/1024" Mo"}'  access.log</pre>
<ul>
<li>Bandwidth waste per month for this server: 170M&#8230;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://charles.lescampeurs.org/2008/09/22/about-user-agent-strings/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
