It should not be understated how bad this is. Your #1 expectation as a cloud database provider is to keep data safe and recoverable.
I hope for at least their sake they took a backup of everyone's DB that could be restored in another region, but based on the fact that they didn't do a scream test, I doubt they thought about this either.
This must have been forced upon by upper management, because there is no way someone along the chain to actually delete data did not suggest a scream test. No way someone didn't say "this is a terrible idea, email is not reliable".
Adding Influx right next to GCP of providers I'm never using. Self-hosting is the way, and use ClickHouse.
> The Scream Test is simple – remove it and wait for the screams. If someone screams, put it back. The Scream Test can be applied to any product, service or capability – particularly when there is poor ownership or understanding of it’s importance.
I agree. This should be an indication to all current users that they should no longer trust InfluxData with their business.
The CTO seems to have been checked out for a long time (just look at how little developer engagement there is on here) and the CEO seems to have no idea how to run a DBaaS. The fact that nobody else from the company has stepped in to try and defuse this should terrify anyone who has data on InfluxData's cloud.
This is the beginning of the end. It seems like all of the good people have left the company, and being willing to destroy credibility to cut costs is a clear sign that the company is running on fumes.
So, now is the time - find your alternative, whether it's Timescale, QuestDB, VictoriaMetrics, ClickHouse, or just self-hosting.
It's the same "we 'tried'" message they have here. Even worse, this wasn't a regulatory shut-down, this was a lack of demand decision. They had 100% control over the timing and means of the shut-down. They didn't even keep backups! They just deleted everything.
Some highlights from the blog. It reads like a "cover my ass" to the board, rather than fixing problems for customers.
* > Over the years, two of the regions did not get enough demand to justify the continuation of those regional services.
* In other words, they had no external pressure. They just shut this down entirely on their own accord.
* Immediately, blames customers for not seeing notifications. Explaining "how rigorous" their communication was.
* > via our community Slack channel, Support, and forums, we soon realized that our communication did not register with everyone
* In other words, "we didn't look at any metrics or usage data. How could we have possibly known people were still relying on this?"
* > Our engineering team is looking into whether they can restore the last 100 days of data for GCP Belgium. It appears at this time that for AWS Sydney users, the data is no longer available.
* That's literally unbelievable. They didn't even keep backups! They deleted those too! Even it the region is going down, I'd expect backups to be maintained for their SLA.
* Lastly, a waffling "what we could have done better" without any actual commitment to improvement. Insane.
This is pretty much corporate suicide. I really don't understand what they are trying to achieve with this and their attitude in this thread is baffling.
I completely agree with you regarding corporate suicide. The rest of my post is complete speculation.
The least nonsense thing I can think of is that they weren't paying their bills. They weren't paying rent, the landlord locked them out and repo'd their servers, or something similar. (perhaps they were inspired by Elon Musk's recent antics?)
If that were the case, they would not disclose that that's what happened. If they disclosed that, all of their other customers would immediately begin migrating their data; not tomorrow, not next week, now.
If there were any excuse they would give it. "We were hacked!" "It was a disgruntled ex-employee!" "The datacenter burned down!" "It's those dirty EU data laws!" etc.
Shutting down the data center and deleting all the data (without migrating) at the same exact time and that was Plan A--nah I don't believe that.
This was announced months in advance (albeit not in a way that could possibly guarantee that most customers would ever discover it) so I don't think your speculation is true. As best I can tell from the information publicly available, they really did shut down the data center and delete all data simply to cut costs with no external push whatsoever.
I agree with your comments about how Influx handled this shutdown.
The several things you might mean by self-hosting have their own pros and cons. The right choice is very context-specific, and assuming that it’s always the right choice is wrong. It certainly can be, though.
As for ClickHouse, that mention seems like a throwaway comment, unless you are advocating a boycott of even the open source InfluxDB due to its corporate author’s behavior and view ClickHouse as the closest alternative.
This incident has nothing to do with the comparison of the open source InfluxDB vs the open source ClickHouse, nor would it impugn the viability of InfluxDB hosted by a more responsible data custodian than Influx the company.
And GCP hasn’t done any similar inadequately notified shutdown of service with immediate and irreversible data loss, as far as I know.
(Disclosure: I have worked for Google in the past, including GCP, but not in over 8 years. I’m speaking only for myself here. I’ve never worked for Influx ClickHouse.)
I hope for at least their sake they took a backup of everyone's DB that could be restored in another region, but based on the fact that they didn't do a scream test, I doubt they thought about this either.
This must have been forced upon by upper management, because there is no way someone along the chain to actually delete data did not suggest a scream test. No way someone didn't say "this is a terrible idea, email is not reliable".
Adding Influx right next to GCP of providers I'm never using. Self-hosting is the way, and use ClickHouse.