Thoughts on server-side analytics

I wanted to ask your thoughts on server-side analytics (using the log files). If it was used to find an approximate number of page views and unique visitors.

Something like Advanced Web Statistics or GoAccess. As I understand, you are not collecting any more data than you normally are, and it could be an interesting alternative to the JavaScript invasive solutions.

Is it really what I am saying or am I missing something?

Thanks!

Related

Plain logs are OK, but limited in amount of info and fingerprinting. They rely on what the browsers say about referrers and user agents, which is easily faked. You can run 1st party scripts too, and get people to register and login, but who has time for combining all that, especially if you’ll be paying someone to improve your search engine results and manage your social media marketing…

Indeed. However, you already get a huge amount of data about users this way:

  • Their IP address (allows rough geo location, and shows the ISP)
  • Their user agent (that may contain their OS and more info about the client)
  • The referrer (shows the last page visited by the user; allows to see how the user navigates; may expose the interests of a user)
  • The timestamp (allows to see when a user accesses your content and how long a user stays)

If you log full request headers, you get more information that can be (mis)used for fingerprinting like supported languages, supported content types, supported HTTP compression etc.

This logging and fingerprinting is completely transparent for clients, so a user can’t see this. On the other hand, a server operator can easily claim not to track anyone while logging everything. You can’t check this without accessing the server.

I’m using for like 5 years Piwik/Matomo importing Apache logs via cron server side and it does the job pretty much well.

you are not collecting any more data than you normally are,

This is an interesting point too. When did keeping logs on all visitors become “normal” to start with?

Their IP address, user agent, referrer, timestamp

First 3 are easily faked with proxies or addons. Last one is easily misinterpreted, especially when people open lots of tabs and go back & forth. GIGO makes it all a waste.

Of course, you can change these values, however, the vast majority of people doesn’t do this. Besides, you could use different fingerprinting methods to identify actual client information. Moreover, many web clients like RSS/Atom feeds readers don’t come with a possibility to change their user agent, or web browser like Privacy Browser (F-Droid) allow you to change the User Agent while the browser adds a fixed request header, so it can be easily identified.