Goaccess: Incorrect parsing of XFF

Created on 5 May 2020  路  7Comments  路  Source: allinurl/goaccess

In my development environment where the XFF header only gives a single IP, I'm able to ignore the first field, use %h in the second field and goaccess works. In my production environment, the XFF field is most-times a comma-separated list (but unquoted). I can't get goaccess to work with this data.

With the following log file contents

2001:8004:e02:48b7:3417:266c:7436:f95e, 64.252.140.97, 10.36.42.175, 10.36.30.137, 52.86.63.69 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 24675 "GET /wiki/en/load.php?modules=skins.minerva.icons.images&image=language-switcher&format=original&lang=en&skin=minerva HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ireland_Church_Records" "Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1" 1250 947
2001:8004:e02:48b7:3417:266c:7436:f95e, 64.252.140.97, 10.36.31.57, 10.36.35.206, 52.86.63.69 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 17451 "GET /wiki/en/load.php?modules=skins.minerva.content.styles.images&image=a.external&format=original&lang=en&skin=minerva HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ireland_Church_Records" "Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1" 1251 672
2001:8004:e02:48b7:3417:266c:7436:f95e, 64.252.140.97, 10.36.52.160, 10.36.35.206, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 17356 "GET /wiki/en/load.php?modules=skins.minerva.icons.images&image=edit-enabled&format=original&lang=en&skin=minerva HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ireland_Church_Records" "Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1" 1245 673
127.0.0.1 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 552 "GET /server-status?auto HTTP/1.1" 200 "-" "-" 167 1485
54.236.1.15, 70.132.60.141, 10.36.17.59, 10.36.30.137, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 770808 "GET /wiki/en/Isle_of_Wight_County,_Virginia_Genealogy HTTP/1.0" 200 "-" "Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)" 933 44425
75.109.32.97, 130.176.10.98, 10.36.39.80, 10.36.35.206, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 435658 "GET /wiki/en/Marshall_County,_West_Virginia_Genealogy HTTP/1.0" 200 "https://www.google.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Mobile/15E148 Safari/604.1" 1131 22193
2601:100:8181:bf60:38eb:1c18:72f7:2d6e, 130.176.66.94, 10.36.39.80, 10.36.35.206, 52.86.63.69 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 21831 "GET /wiki/en/img_auth.php/thumb/f/f1/Ohio_State_Flag.jpg/250px-Ohio_State_Flag.jpg HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ohio_Land_and_Property" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 5086 9011
2601:100:8181:bf60:38eb:1c18:72f7:2d6e, 130.176.66.94, 10.36.42.175, 10.36.35.206, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 21285 "GET /wiki/en/img_auth.php/thumb/8/83/Ohio_company_land_office.jpg/300px-Ohio_company_land_office.jpg HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ohio_Land_and_Property" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 5105 45407
2601:100:8181:bf60:38eb:1c18:72f7:2d6e, 130.176.66.143, 10.36.42.175, 10.36.30.137, 52.86.63.69 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 19047 "GET /wiki/en/img_auth.php/thumb/9/90/Ohio_Lands.png/300px-Ohio_Lands.png HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ohio_Land_and_Property" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 5078 72302
2001:8004:e02:48b7:3417:266c:7436:f95e, 64.252.140.97, 10.36.31.57, 10.36.30.137, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 449 "GET /wiki/public_html/Museo_Slab_500.otf HTTP/1.0" 200 "https://www.familysearch.org/wiki/en/Ireland_Church_Records" "Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1" 1114 62458

Using this command fails with an error:
cat /tmp/access.log | goaccess --log-format '~h{, } %^ %^ %e [%d:%t %z] %D "%r" %s "%R" "%u" %^ %b' --time-format "%T" --date-format "%d/%b/%Y" -

Error
Parsed 1 linesproducing the following errors:

Token '0000] 409 "GET /wiki/public_html/Museo_Slab_500.otf HTTP/1.0" 200 "https' doesn't match specifier '%d'

Format Errors - Verify your log/date/time format

bug lodattime format question

All 7 comments

You were pretty close, this should work:

goaccess access.log --log-format='~h{, } %^ %e [%d:%t %^] %D "%r" %s "%R" "%u" %^ %b' --date-format=%d/%b/%Y --time-format=%T

Just discovered that this works too (I manually edited the sample to quote the first field):
cat /tmp/access.log | grep -v ^\"127 | goaccess --log-format '~h{", } %^ %^ [%d:%t %z] %D "%r" %s "%R" "%u" %^ %b' --time-format "%T" --date-format "%d/%b/%Y" -

However, it seems to me that the first token is swallowing the second field in the log file: This is the original Apache specification of LogFormat:
"%{X-Forwarded-For}i %h %l %u %t %D \"%r\" %>s \"%{Referer}i\" \"%{User-Agent}i\" %I %O"

There are twelve fields and %h is separate from X-forwarded for. Am I missing something?

Sorry, I think I'm not following your question, Are you referring to the quote within ~h{", }?

No that's not my question. I'm sorry, let me try to explain.

Given a log entry like
75.109.32.97, 130.176.10.98, 10.36.39.80, 10.36.35.206, 52.1.214.79 127.0.0.1 - - [05/May/2020:21:25:27 +0000] 435658 "GET /wiki/en/Marshall_County,_West_Virginia_Genealogy HTTP/1.0" 200 "https://www.google.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Mobile/15E148 Safari/604.1" 1131 22193

which is produced by Apache having a LogFormat directive of
LogFormat "%{X-Forwarded-For}i %h %l %u %t %D \"%r\" %>s \"%{Referer}i\" \"%{User-Agent}i\" %I %O"

My understanding of Apache is that the XFF field is 75.109.32.97, 130.176.10.98, 10.36.39.80, 10.36.35.206, 52.1.214.79 and that the %h field is 127.0.0.1.

Parsing this log record with GoAccess using
--log-format='~h{, } %^ %e [%d:%t %^] %D "%r" %s "%R" "%u" %^ %b' --date-format=%d/%b/%Y --time-format=%T
yields the following:

~h{, } consumes 75.109.32.97, 130.176.10.98, 10.36.39.80, 10.36.35.206, 52.1.214.79 and the 127.0.0.1 (which is a separate field from Apache)
%^ ignores the -
%e consumes the next -
[%d:%t %^] consumes [05/May/2020:21:25:27 +0000]
%D consumes 435658
"%r" consumes "GET /wiki/en/Marshall_County,_West_Virginia_Genealogy HTTP/1.0"
%s consumes 200
"%R" consumes "https://www.google.com/"
"%u" consumes "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Mobile/15E148 Safari/604.1"
%^ ignores 1131
%b consumes 22193

If my understanding is correct, shouldn't GoAccess see 127.0.0.1 as a separate field and not grab it into the ~h{, } expression?

TLDR;
When parsing an XFF field followed by a field with an IP Address (like host) the first token is swallowing the second field in the log file. I believe this is a bug.

In log parser, for don't make mistakes with complex field [with variable cols] I sugest ever be demilited with "".
Think about this... and and how would it be user-agent without quotes?

I've pushed a commit that should fix this. Please feel free to test it out by building from development and let me know if that works for you.

goaccess xff.log --log-format='~h{, } %^ %^ %e [%d:%t %^] %D "%r" %s "%R" "%u" %^ %b' --date-format=%d/%b/%Y --time-format=%T

It will be pushed out in the upcoming release. Thanks again for reporting it!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tbarbette picture tbarbette  路  3Comments

domainoverflow picture domainoverflow  路  3Comments

konungrl picture konungrl  路  3Comments

SergioDG-YCC picture SergioDG-YCC  路  3Comments

g33kphr33k picture g33kphr33k  路  3Comments