Fathom Analytics status
If you'd like to subscribe to status updates for Fathom, here is our status rss feed.
No issues reported, woohoo!
AWS Downtime Resolved
Parts of us-east-1 appeared to have fallen offline for around 7 minutes. During this time, our services were only partially available (we use multiple clouds to process traffic). Sorry for the inconvenience here, we run across multiple availability zones and this one was out of our control.Posted at 11:49am PT
LetsEncrypt Bug Resolved
Two customers using our old custom domain system (deprecated last year) ran into issues with SSL certificates being revoked by LetsEncrypt, our old certificate provider.
We looked into it and it was widespread, affecting everything who hasn't moved to the new custom domain infrastructure. This had a huge impact across the web, and affected 2.7 million websites, not just Fathom customers using the old infrastructure.
As soon as we were made aware of this issue, we started working with Caddy founder, Matt Holt, to come up with a solution and re-issue the certificates.
There may have been a slight drop in traffic for anyone using our old custom domain solution, and we apologize for that. This was out of our hands completely and we moved as swiftly as possible.
On a related note, nobody should be using our old custom domain solution. We have a brand new, globally distributed, rapid-fast custom domain solution that we launched last year. We will likely maintain the old infrastructure for the rest of 2022, but we really do advise that you move ASAP.Posted at 1:03pm PT
Infrastructure upgrades Resolved
We performed some upgrades to our default Fathom script (cdn.usefathom.com) on the 11th October 2021. We moved the ingest endpoint to a new CDN, with added security and global availability (to improve performance worldwide).
For 1-2 hours during this move, whilst DNS was propogating, we had some incorrect configuration on our new CDN which meant that not all pageviews were being tracked. So for customers who aren't using custom domains, you'll see that your pageviews will have dropped during that time. Unfortunately, we have no way of "back-filling" missing pageviews, as we don't keep any kind of access logs.
Again, customers who are using custom domains were not affected. However, folks using our default Fathom script will notice a slight drop during that period. The reason this issue wasn't caught sooner is because we monitor for downtime, not incorrect configurations, and the response wasn't technically broken. And then the reason why some pageviews were collected and some weren't was down to the fact that global DNS propogation takes time, meaning some of your website visitors were hitting our old infrastructure, whilst some will have hit the new infrastructure.
Following this, we're going to be implementing changes around testing. We'll now be monitoring end to end, ensuring that the pageview is collected and that it appears in our database. Clearly monitoring for uptime alone isn't enough, and we need full, 24/7, minute by minute checks for the full end-to-end process. When testing manually, we had assumed the DNS had propogated, but it was still hitting the old servers. We apologize to everyone affected here.Posted at 1:30pm PT
Spam/DDoS attack Resolved
Today, we tried out some new rules that were meant to reduce the strictness of our firewall. Unfortunately, the rule change meant huge amounts of spam was able to pass through, and we had to make a snap decision to go with a nuclear option, which was to clear the queue backlog (around 6 million pageviews), as 99% of traffic coming in was spam. Because of this, some sites lost up to 1.5 hours of traffic data.
If you weren't hit with spam, you'll see a tiny blip on your stats for today (between 6PM - 8PM PT). If you were hit with spam, you'll see a huge spike. If you were hit with the spam, please reach out us.
This isn't our first time dealing with spam, and we've invested a lot of time into spam protection. We'll be rolling out an additional firewall in Version 3 and, of course, we've reverted the rule change. We sincerely apologize for the inconvenience here. We have been under DDoS attacks since November 2020, and the series of events here were unfortunate.Posted at 8:09pm PT
DDoS Attack Resolved
Today’s attack was unique because it was completely unintentional.
There was a problem with a customer’s site, because they had programmed an infinite loop on their event tracking code. So, what would happen is that a visitor would load their page, and then an event would fire itself at a constantly-high rate until the page was closed.
(Making things worse: the page played a popular and very fantastic song that’s 3:08 long, so the page was left open for quite a while by most people.)
Now, we’ve hardened our security a lot since we were first DDoS’ed last year, and our firewall routinely blocks similar attacks every week. However, the issue with this incident is that our security was focused on page collection, not event collection. As of now, we've put additional security in front of event collection to prevent this from happening again.
Fathom did not go offline, but it did create a backlog. Once we isolated and blocked the offending customer’s event (and had them remove the code from their site), our backlog cleared in less than five minutes.
How will this be avoided in the future?
We’re migrating to a new database (finished March 12, 2021) that can easily handle things like this, and it will process backlogs like the above much faster. We’ve now added security checks to event as well. If a similar event happened in the future, our software would automatically block offenders (even if their music tastes are quite acceptable). Let us know if you have any questions. We’re always just an email away.Posted at 9:17pm PT
There’s currently a backlog in our queue due to a targeted attack from a motivated party. All stats are still being collecting, but will take longer to show up on your dashboard. We’re working with our 24x7 DDoS AWS team to resolve this currently.
We are working on blocking more attacks like these and a building tool to mitigate them, and it’s almost ready for us to put into action (but not quite yet).Posted at 6:51pm PT
Provider issues Resolved
What a day. AWS having issues like this, where it actually makes mainstream news, is unbelievably rare. Having said that, it's given us some things to think about. We can't recall an outage with such an impact since 2012. Following this, it's given us a +100 to look into multi-region availability.
Having said that, it looks like our data collector remained available throughout. Early on in the problem, we saw a large queue backlog, but it was quickly handled. And our current visitors continued to work throughout the day. We are currently working through millions of pageviews in our backlog which need to be aggregated into your dashboard stats. The good news is that it seems like we weren't impacted severely.
One final note we'd like to make regarding the pageview backlog. We've had numerous backlogs with pageviews since our inception. We're pleased to share that we are almost ready to move to Elasticsearch, where we'll see no more backlogs and real-time data reporting. This will be a huge win for us and our customers, and we can't wait to have it completed.Posted at 1:26pm PT
Interestingly, since we identified the attack, our pageview collector seems to have been running just fine. Looking at our database, we can see a large backlog of data, and the issue seems to be that pageviews aren't aggregating to your dashboard. Once AWS fixes the problems they're having, the aggregation will resume.
One thing we want to make clear, this has nothing to do with the recent DDoS attacks we were receiving, this is related to a large scale AWS problem. These regional outages are incredibly rare and, unfortunately, we are in a position where we simply have to wait. As we say, we seem to only have been mildly impacted.Posted at 12:01pm PT
Our service provider is currently having issues and we are looking into it.Posted at 6:43pm PT on Nov 17th
Layer 7 DDoS attack Resolved
We've been under DDoS attacks for a while now, on and off. Today we had our first win. We engaged our 24x7 DDoS team, they identified patterns in the attack and were able to completely eliminate the attack within a few hours. Because of previous protection measures put in place, Fathom was up & down, rather than going offline for a solid chunk of time. This is a big step forward for us in response to these attacks. You can read more about these attacks here.Posted at 12:13pm PT
Layer 7 DDoS attack Resolved
The attack on 14th November lasted around 3 hours, and our data collector was up & down, but we didn't want to update this page until we'd finished implementing our new layer of defence.
As we've said in previous updates, we've been the target of a highly motivated, malicious attack that is looking to damage Fathom.
We've made the following changes:
- We now have a 24x7 DDoS attack mitigation team. This a highly specialized team who, in the event of a huge DDoS attack against Fathom, will help us absorb it.
- We have developed a new threat checking system within our application. It's invisible, and works to block perceived attacks that get past our firewall.
- We are going to be building new spam protections using machine learning over the next few months.
- All analytics software gets hit with spam. It's not fun for us or for the handful of our customers who were targeted. However, we are taking this attack as a learning opportunity, and we are now developing an advanced spam protection system that will look to a) prevent spam where possible and b) clean up spam gracefully.
As we've said before, we truly appreciate your patience during these attacks and we're sorry that they're happening.Posted at 12:14pm PT
We are under attack again. We have actually been under attack for the past week, on and off, and this is a targeted, motivated attack on our service. We have mitigated some spam, which is sent every single hour, but ocassionally we are flooded with hundreds of thousands of concurrent requests.
There is not a lot we can do when under this kind of attack but we are working on solutions and will be bringing in a 24/7 DDoS protection service very soon.Posted at 4:00pm PT
Layer 7 DDoS attacks Resolved
The attack ended November 7th PT. As we stated before, only 0.01% of customers were hit with spam, but all customers were affected.
The attack was intermittent and typically ran in periods of 3-4 hours at random times of the day, not 24/7.
Our system throttled incoming pageviews in defense, so we didn't collect 100% of page views sent to us. We apologize for the inconvenience here. We can assure you that we did everything within our power to absorb this malicious, unprovoked attack on Fathom.
Moving forward, here is how we are responding to the challenge of spam & Layer 7 DDoS attacks:
- We are working with our service providers to see what we can do about this in the future.
- We are now building our own spam detection system that will be complete this week. We already rolled out Version 1 of our spam protection system during the attack and mitigated many millions of additional spam pageviews.
- We knew that this attack would only lead to Fathom becoming a better analytics product. And that's exactly what is happening right now.
And finally, thank you so much to everyone who has shown us support. Every single person affected has been so understanding during this challenging time and we just want to say thank you. We love building this software for you, and we appreciate all of your patience.Posted at 7:34pm PT
The attack is back. We have mitigated the majority of the spam but the dashboard it still having intermittent availability issues.
At this moment in time, we're continuing to work on anti-spam measures, introducing new technology in response to the attack.Posted at 7:18am PT on Nov 7th
We haven't updated this until now because we weren't certain that the attack was over. It seems to have stopped today. Less than 0.01% of customers were affected by the spam attack, but we still saw backlogs.
We've had plans to improve the way we process pageviews (to prevent backlogs & improve aggregation speed) for a while now. Following this attack, we've now prioritized these tasks, and will be rolling out the upgrades as soon as they are ready.Posted at 3:21pm PT on Nov 7th
We mitigated a large chunk of the attack by using pattern matching. Unfortunately, the attack was done via a botnet, and a lot of the traffic looked legitimate, so there was no kind of blocking we could do. The attack has now stopped.
For those who have been targeted as part of this spam attack, please email us, and we will clear the spam on your account.
One of the big problems we had with this attack was that the floods of traffic lead to issues with the Current Visitors box and the speed of data aggregation. We know that stopping spam is impossible, and all analytics companies are subject to it, but we want to be able to absorb spam traffic without backlogs.
We are now working on a self-serve spam removal tool and rebuilding our aggregation system to ensure backlogs don't occur in the future.Posted at 7:21pm PT on Nov 6th
The attack has continued into today. The backlog is still happening and some customers are being hit with large amounts of referral spam. We are still actively working on this issue. Thanks so much again for your patience with this matter.Posted at 8:07am PT on Nov 6th
Fathom is being targeted by malicious attack, sending tons of referral traffic our way. All customers will experience delays in seeing new traffic show up on their dashboards (but that data is still being collected). We appreciate your patience.Posted at 9:38am PT on Nov 5th
We have mitigated the attack, and are working through our backlog. As this has happened before, we are working on additional protections and backlog defences. If you have any referral spam on your dashboard, let us know and we'll purge it for you. Thanks again for your patience.Posted at 11:32am PT on Nov 5th
Dashboard issues Resolved
Everything is back to normal. We were encountering rate limiting from a service that the dashboard relied on (AWS Parameter Store). We have increased the throughput and things are now working as expected.Posted at 7:59am PT
We received reports that the Fathom dashboard wasn't loading.Posted at 7:34am PT
Data delay on dashboards. Resolved
The backlog has cleared, your stats are fast again.Posted at 1:02pm PT
There is currently a delay in data appearing on your dashboard. Your data hasn't been lost, it's just delayed. This is due to a customer going viral beyond anything we've seen previously.Posted at 6:07am PT