About user agent strings

I was surprised when I saw the length of the Chrome user agent string last week:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13

And in our logs:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13

a user agent string of 119 characters. It looks quite a waste of space but is Google Chrome the only one? Surprisingly, Chrome is far from the worst.

Best of one of our log file:

  • 641 characters: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 GoogleToolbarFF 3.0.20070420 GoogleToolbarFF 3.0.20070420 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525 GoogleToolbarFF 3.0.20070525
  • 337 characters: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; DA4BB049-ADVLOVER|0001|DSL; C:\DOCUME~1\everey\CONFIG~1\Temp\; C:\DOCUME~1\zulcan\CONFIG~1\Temp\; C:\DOCUME~1\nilfer\CONFIG~1\Temp\; C:\DOCUME~1\mirmor\CONFIG~1\Temp\; C:\DOCUME~1\ASTNU~1\CONFIG~1\Temp\; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727)
  • 290 characters: Mozilla/5.0 (Windows; U LupinV2.u2/20080827 LupinV2.u2/20080828 LupinV2.u2/20080829 LupinV2.u2/20080830 LupinV2.u2/20080831 LupinV2.u2/20080902 LupinV2.u2/20080903 LupinV2.u2/20080909 LupinV2.u2/20080911 LupinV2.u2/20080912; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
  • 272 characters: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; SU 3.011; User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://bsalsa.com) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30428; .NET CLR 3.0.30422)
  • 202 characters: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; IE7-01NET.COM-1.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2; IE7-01NET.COM-1.1)

The full list

How to extract user agent strings from a HTTP log file?

  • Print user agent strings with its length:
awk -F\" '{print length($6)" "$6}'  access.log
  • print user agent strings that are more than 200 characters length:
awk -F\" '{if ($6 > 200) print length($6)" "$6}'  access.log

In those examples, the access.log file has this log format:

xxx.xxx.xxx.xxx \
www.domain.com - \
[15/Sep/2008:00:00:00 +0200] \
"GET / HTTP/1.1" 200 4242 \
"http://www.domain.com/" \
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"

About bandwith

If you take an average user agent string likes the Firefox one, you have a 91 charaters string.

  • How many entries with a user agent string longer than 120 characters: 249586
awk -F\" '{if (length($6) > 120) print length($6)}' access.log | wc -l
  • Size waste with string longer than 120 characters: 5.67 M
awk -F\" '{if (length($6) > 120) SUM += length($6)-120} END {print SUM/1024/1024" Mo"}'  access.log
  • Bandwidth waste per month for this server: 170M…