Sign inFree trial
Fathom Analytics blog / Technical

Fathom retains IP addresses temporarily for security. How do you protect that data?

Here’s a question we recently got from a customer regarding Fathom being a privacy-first company and how that relates to the data we collect for our analytics

Our Google Analytics importer is now available

If you’ve got an account, sign-in and start using it today. If you’re not yet a customer, grab a 30-day free trial and start using the importer within minutes.

Here’s a question we recently got from a customer regarding Fathom being a privacy-first company and how that relates to the data we collect for our analytics. We like being as transparent as possible, especially when it comes to our privacy practices and data collection, so here’s the whole question and our answer.

Fathom retains logs of IP addresses that visit my website 24 hours before the logs and visitor data is hashed. I was always taught that “what is collected can be intercepted,” so how do I avoid this? I would strongly prefer to be able to tell my visitors that their IP addresses are totally safe. I’m dealing with democracy, human rights, and privacy issues, so I need to be absolutely accurate with my statements on this topic. So my question is this: How can I be certain that the IP addresses you collect for security cannot be intercepted and used against my website visitors?

Great question!

Everyone should be hyper-cautious about the data ANY third-party solution is collecting. Fathom has been built from the ground up to be as respectful as possible to every visitor’s data (and collect as little of it as possible and, when collected, anonymize/aggregate it).

Let’s walk you through how this came about and, very specifically, what we do.

First things first, we needed to find a way to protect our customers’ analytics data. We were getting flooded with spam attacks and are still hit every weekend. We realized that we needed some access log functionality to establish patterns/measure IP activity back in November. It’s clear to see the problem with keeping access logs, especially with a third-party service like Fathom, which runs on over 100,000 websites, some of those dealing with highly sensitive traffic (religion, politics, human rights, privacy, etc.). There was absolutely no way we could keep full access logs—even for just 24 hours. Because we would then have an inventory of all access from a single IP address across multiple websites. That would be a disaster, so we didn’t pursue that route.

Fortunately, we have the option to keep redacted access logs and have them automatically wiped after 24 hours. This means that we keep records of IP addresses but no information about the website they visited. So if a government or malicious actor were to get hold of our redacted access logs, they would only see IPs, and they’d have no insight into which websites or pages a visitor viewed. Fathom is like a VPN. If a VPN service had only one customer, and authorities/malicious actors got hold of any kind of “connection logs” and “history”, all activity by those VPN IP addresses can be tied to an individual. But if you have thousands/tens of thousands/hundreds of thousands/millions of people using the VPN, you have safety in numbers. That’s exactly how Fathom works and, whilst we’re already running on 100,000+ websites, our privacy improves as we are used on more websites.

Away from the redacted access logs, you then get to an interesting issue. Sure, the access logs may not have the website URLs, but what about the analytics database? Could you correlate browsing activity by matching timestamps between access logs and timestamps in the database? You absolutely could because both the database and access logs would have timestamps. So the way we approach this is that the access logs keep “to the second” activity from an IP address, with zero information about the website they visited, and then in our analytics database, we round the timestamp to the nearest minute. We receive far more page views in a single minute than we do in a single second, meaning that if a malicious actor could get 24h of our redacted access logs and our entire analytics database, they can’t match the data. This is incredibly important to us. That’s the design we’ve put in place. And as I mentioned, we process millions of page views a day across hundreds of thousands of sites, so the sheer volume here helps obfuscate the log data.

If a government wants to intercept a “raw IP” accessing the material, they can go to the ISP and request it. So we get into this “which government do you trust?” situation. For example, some EU customers have insisted that we offer a method for them to process all traffic within the EU using EU servers. If we went this route, that would mean that the US government couldn’t intercept that data, as they don’t have the legal ability (we’re a Canadian company). Other customers want EU citizen data to go through the EU and the rest of the world through the US. We’re currently building something called EU Isolation, and that feature will be able to handle that.

If you haven’t already read this, I recommend our data journey - it lays out exactly what we do with data that comes in from each and every page view.

I hope we’ve answered your question and you now feel more comfortable relaying this information to your own audience and visitors.

Update: We recently launched EU Isolation so we now process EU visitor data in the EU on EU servers owned by an EU company. We do this to be fully GDPR (Schrems II) compliant. We don't want our website analytics to break the law.

Do you have a question for us that you’d like answered on our blog?
Let us know and we may choose it to answer on the blog!

Serverless Laravel & SingleStore for Laravel have been retired!

I'm excited to announce that Fathom Analytics now takes up all of my time and I no longer have the capacity to promote & sell my courses. A huge thank you to the thousands of people who allowed me to teach them.

The courses are all hosted on Gumroad, so you can download them or access them via your login there. If you have any issues accessing the course, you can email me on my personal email that you'll already have.

Please do not email Fathom Analytics support for any queries related to my courses.

Jack Ellis is a technical writer, teacher and software engineer with over 15 years in the game. Throughout his career, he’s built software for media companies, governments, insurance companies and international law firms. Jack has taught thousands of developers how to scale their web applications and databases, and regularly shares his experiences on the Fathom Analytics Blog and in the Laravel tips section. Today, he is the co-founder of Fathom Analytics, a Google Analytics Alternative focusing on GDPR compliance and simplicity.

Posted in technical

Get more articles like this each month(ish)

Sign up to be the first to know when new articles like this are published.