Aside from running Ceph as my day job, I have a 9-node Ceph cluster on Rasberry Pi 4s at home that I've been running for a year now, and I'm slowly starting to move things away from ZFS to this cluster as my main storage.
My setup is individual nodes, with 2.5" external HDDs (mostly SMR), so I actually get sligtly better performance than this cluster, and I'm using 4+2 erasure coding for the main data pool for CephFS.
CephFS has so far been incredibly stable and all my Linux laptops reconnect to it after sleep with no issues (in this regard it's better than NFS).
I like this setup a lot better now than ZFS, and I'm slowly starting to migrate away from ZFS, and now I'm even thinking of setting up a second Ceph cluster. The best thing with Ceph is that I can do a maintenance on a node at any time and storage availability is never affected, with ZFS I've always dreaded any kind of upgrade, and any reboot requires an outage. Plus with Ceph I can add just one disk at a time to the cluster and disks don't have to be the same size. Also, I can move the physical nodes individually to a different part of my home, change switches and network cabling without an outage now. It's a nice feeling.
I want to preface this - I don't have strong opinion already here, and I'm curious about Ceph. As someone who runs a 6 drive raidz2 at home (w/ ECC RAM) does your Ceph config give you similar data integrity guarantees to ZFS? If so, what are the key points of the config that enable that?
When Ceph migrated from Filestore to Bluestore, that enabled data scrubbing and checksumming for data (older versions before Bluestore were only verifying metadata).
Ceph (by default) does metadata scrubs every 24 hours, and data scrubs (deep-scrub) weekly (configurable, and you can manually scrub individual PGs at any time if that's your thing). I believe the default checksum used is "crc32c", and it's configurable, but I've not played with changing it. At work we get scrub errors on average maybe weekly now, at home I've not had a scrub error yet on this cluster in the past year (I did have a drive that failed and still needs to be replaced).
My RPi setup certainly does not have ECC RAM as far as I'm aware, but neither does my current ZFS setup (also a 6 drive RAIDZ2).
Nothing stopping you from running Ceph on boxes with ECC RAM, we certainly do that at my job.
I was running glusterfs on an array of ODROID-HC2s ( https://www.hardkernel.com/shop/odroid-hc2-home-cloud-two/ ) and it was fun, but I've since migrated back to just a single big honking box (specifically a threadripper 1920x running unraid). Monitoring & maintaining an array of systems was its own IT job that kinda didn't seem worth dealing with.
I just setup a test cluster at work to test this for you:
4 nodes, each node with 2x SAS SSDs, dual 25Gb NICs (one for front-end, one for back-end replication). The test pool is 3x replicated with Snappy compression enabled.
On a separate client (also with 25Gb) I mounded an RBD image with krbd and ran FIO:
For the standard 3x replicated setup, 3 nodes is the minimum for any kind of practical redundancy but you really want 4 so that after failure of 1 node all the data can be recovered onto the other 3 and still have failure resiliency.
For erasure coded setups which is not really suited to block storage but mainly object storage via radosgw(s3) or cephfs you need minimum k+m and realistically k+m+1. That would translate to 6 minimum but realistically 7 nodes for k=4,m=2. That’s 4 data chunks and 2 redundant chunks which means you use 1.5x the storage of the raw data (half that of a replicated setup). You can do k=2,m=1 also. So 4 nodes into that case.
I would say the minimum is whatever your biggest replication or erasure coding config is, plus 1. So, with just replicated setups, that's 4 nodes, and with EC 4+2, that's 7 nodes. With EC 8+3, which is pretty common for object storage workloads, that's 12 nodes.
Note, a "node" or a failure domain, can be configured as a disk, an actual node (default), a TOR switch, a rack, a row, or even a datacenter. Ceph will spread the replicas across those failure domains for you.
At work, our bigger clusters can withstand a rack going down. Also, the more nodes you have, the less of an impact it is on the cluster when a node goes down, and the faster the recovery.
I started with 3 RPis then quickly expanded to 6, and the only reason I have 9 nodes now is because that's all I could find.
I would love to hear more about your Ceph setup. Specifically how you are connecting your drives and how many drives per node? I imagine with the Pis limited USB bus bandwidth, your cluster performs as more of an archive data store compared to realtime read/write like the backing block storage of VMs. I have been wanting to build a Ceph test cluster and it sounds like this type of setup might do the trick.
Each node is completely separate, housed in a good quality aluminum enclosure with a fan, and sitting on top of an external USB Seagate 2.5" portable drive (either 4TB or 5TB), connected via USB 3 cable. I'm pretty sure these drives are SMR, but they've been good to me, and they're fast enough for my needs.
Power is provided either using official RPi power supplies, or a couple of multi-port Anker USB power supplies that I had previously. A limit of 2.5 amps does not seem to cause any issues.
Currently everything is connected to a single switch, but I move things around my office sometimes, and sometimes have the RPis connected to two different switches.
Right now, everything including the 1 switch is connected to a single APC UPS, and that thing is super old, so that's another SPOF.
My clients currently are a few wired desktops and laptops over wifi, all connecting via CephFS. I haven't tested with librbd or krbd, I imagine it wouldn't be fast.
The RPis are mostly 8GB, but I do have a couple 4GB, and one RPi 400, which is kind of hilarious.
Everything is running Ubuntu 20.04, Ceph Pacific, and deployed from the first node with cephadm.
I use only Samsung microSD cards, either 32GB or 64GB. I don't think it matters what kind, but getting bigger cards makes me feel like they'll last longer. Most of the nodes have the var partition on the external drives (on a small partition at the beginning of the drive), but I do have a few where I didn't set that up early on, and haven't gotten around to redoing it.
I partition the drives and put LVM on manually, and tell cephadm to use the specific LV instead of the bare drive.
If you want any kind of performance, definitely set your expectations very low, but for me this works. I can stream at least a pair of 4K movies off this simultaneously, and I also run an instance of Paperless-NG off this over a CephFS mount and haven't had any issues.
I tried using Ceph twice at home. Once was via Proxmox, and it installed and ran perfectly fine, although tbf I didn't load it with much.
The next was via Rook, since I have a Kubernetes cluster, and it was a nightmare. I spent a week or so reading through all the docs I could find before I felt prepared to go through with it, only to have random clock sync issues that Reddit informed me were due to me enabling power savings mode in the BIOS for my nodes.
ZFS's biggest hiccup for me is when I do a kernel update and DKMS borks the upgrade. Other than that, it's been rock-solid. I run a normal and backup node with it, no regrets.
I solved the ZFS DKMS bork issue by moving to Debian 11 from Centos 8. I've had zero openZFS issues since the move. On Centos it would require work every time a sufficient kernel upgrade came in.
Since I'm familiar with RHEL I just swapped some of the Debian default services for RHELish alternatives (Firewalld, Podman, etc.).
> Plus with Ceph I can add just one disk at a time to the cluster and disks don't have to be the same size.
I'd like to note that ZFS now has RAID-Z expansion which allows us to do exactly that! It's an essential feature for home users since it allows us to gradually expand capacity instead of buying up all the storage up front at great cost.
I too researched ceph for this exact reason but was told the hardware requirements were too high for a typical home lab, yet you're running ceph on raspberry pis... I should probably look into ceph once more.
I'm also running ceph (using the rook kubernetes operator) in my homelab. Been running this setup for 9 months now with 2 cheap HP elitedesk workstations i picked up on ebay and 2 8TB HDDs in each.
Since this setup has run incredible smooth so far, I plan on using SolidRun's HoneyComb LX2 as a ceph node with bigger disks and nvme write cache in the future. I looked at the raspberry pi 4, but was not too impressed by the single PCIe 3.0 lane, since I also plan on using NVME disks as ceph's metadata storage device to speed up the hard disk with the normal data behind it and the ceph recommendation to use 10GbE NICs.
The HoneyComb LX2 has 4 built in 10GbE ports, 16 A72 cores, actual DDR4 RAM slots, a 4 lane PCIe 3.0 m.2 slot and an open-ended (so you can put in a full x16 device) PCIe 3.0 slot with 8 lanes for a max of 8Gbyte/s bandwidth.
Since it's an arm box it's incredible energy efficient which is important since energy prices are increasing in my country. Also its the only affordable performant arm device at 800USD.
Man, Ceph really doesn't get enough love. For all the distributed systems hype out there - be it Kubernetes or blockchains or serverless - the ol' rock solid distributed storage systems sat in the background iterating like crazy.
We had a huge Rook/Ceph installation in the early days of our startup before we killed off the product that used it (sadly). It did explode under some rare unusual cases, but I sometimes miss it! For folks who aren't aware, a rough TLDR is that Ceph is to ZFS/LVM what Kubernetes is to containers.
This seems like a very cool board for a Ceph lab - although - extremely expensive - and I say that as someone who sells very expensive Raspberry Pi based computers!
Ceph is fantastic. I use it as the storage layer in my homelab. I've done some things that I can only concisely describe as super fucked up to this Ceph cluster, and every single time I've come out the other side with zero data loss, not having to restore a backup.
I think many people (myself included) had been burned by major disasters on earlier clustered storage solutions (like early Gluster installations). Ceph seems to have been under the radar for a bit of time when it got to a more stable/usable point, and came more in the limelight once people started deploying Kubernetes (and Rook, and more integrated/wholistic clustered storage solutions).
So I think a big part of Ceph's success (at least IMO) was its timing, and it's adoption into a more cloud-first ecosystem. That narrowed the use cases down from what the earliest networked storage software were trying to solve.
We're more and more feeling we made the wrong call with gluster... The underlying bricks being a POSIX fs felt a lot safer at the time but in hindsight ceph or one of the newer ones would probably have been a better choice. So much inexplicable behavior. For your sake I hope the grass really is greener.
Can someone with experience with Ceph and MinIO or SeaweedFS comment on how they compare?
I currently run a single-node SnapRAID setup, but would like to expand to a distributed one, and would ideally prefer something simple (which is why I chose SnapRAID over ZFS). Ceph feels to enterprisey and complex for my needs, but at the same time, I wouldn't want to entrust my data to a simpler project that can have major issues I only discover years down the road.
SeaweedFS has an interesting comparison[1], but I'm not sure how biased it is.
Seaweedfs has problems with large "pools" it's based on an old facebook paper (haystack) and supposed for block storage to distribute large image caches. I found it mediocre at best as it's documentation was lacking, performance was lacking (in my tests) and the multitude of components were hard to get working.
The idea behind it is that every daemon uses one large file as data store to skip slow metadata access. There are different ways to access the storage over gateways.
MinIO is changing so much in the last years thatI can't give a competent answer but compared to seaweedfs it uses many small local databases. Right now it's deprecating many features like the gateway and it is split into 2 main components (cli and server) compared the seaweedfs deployment is dead simple, but I don't know which direction the project is going. Went from a normal open source project to a more business like deal (in what I saw) like I said I didn't quite follow the process.
Ceph is based on blocmlk storage. Offers an object gateway (s3/swift), fs (cephfs) and block storage (rbd). You can access everything through librados directly as well. For a minimal setup you need a "larger" vluster but it is the most flexible solution (imho). Uses the most resources as well, but you can do nearly everything you want without limit with it.
I love it, but when it fails at scale, it can be hard to reason about. Or at least that was the case when I was using it a few years back. Still keen to try it again and see what's changed. I haven't run it since bluestore was released.
Yeah, I've been running a small Ceph cluster at home, and my only real issue with it is the relative scarcity of good conceptual documentation.
I personally learned about Ceph from a coworker and fellow distributed systems geek who's a big fan of the design. So I kind of absorbed a lot of the concepts before I ever actually started using it. There have been quite a few times where I look at a command or config parameter, and think, "oh, I know what that's probably doing under the hood"... but when I try to actually check that assumption, the documentation is missing, or sparse, or outdated, or I have to "read between the lines" of a bunch of different pages to understand what's really happening.
I've run Ceph at two Fortune 50 companies since 2013 to now, and I've not lost a single production object. We've had outages, yes, but not because of Ceph, it was always something else causing cascading issues.
Today I have a few dozen clusters with over 250 PB total storage, some on hardware with spinning rust that's over 5 years old, and I sleep very well at night. I've been doing storage for a long time, and no other system, open source or enterprise, has given me such a feeling of security in knowing my data is safe.
Any time I read about a big Ceph outage, it's always a bunch of things that should have never been allowed in production, compounded by non-existent monitoring, and poor understanding of how Ceph works.
I will once again lament the fact that WD Labs built SBCs that sat with their hard drives to make them individual CEPH nodes but never took the hardware to production. It seems to me there's still a market for SBCs that could serve a CEPH OSD on a per-device basis, although with ever increasing density in the storage and hyperconverged space that's probably more of a small business or prosumer solution.
Yeah, those were really cool. I saw some homelab setups using the ODroid HC2 from Hardkernel in a similar way.
The 2 issues with this setup were that the HC2 was using a low-performance armv7 processor, with armv7 being a very unsupported platform by most software and the fact that you can't use a flash-based disk for a ceph bluestore metadata device since it only had one SATA port.
Credit where it's due - this is some 18 watt awesomeness at idle. Is it more "practical" than doing a Mini-ITX (or smaller, like one of those super small tomtom with up to 5900HX) build and equipping it with one one or more NVME expansion cards? Probably not. But it's cool.
Now, if there were a new Pi to buy. Isn't it time for the 5? It's been 3 years for most of which they've been hard to fine. Mine broke and I really miss it because having a full blown desktop doing little things makes no sense, especially during the summer.
18 W idle is kinda horrible if you just want a small server (granted, this isn't one server, but instead six set-top boxes in one). That's recycled entry-level rack server range, which come with ILOM/BMC. Most old-ish fat clients can do <10 W, some <5 W, no problem. If you want a desktop that consumes little power when idle or not loaded a lot, just get basically any Intel system with an IGP since 4th gen (Haswell). Avoid Ryzen CPU with dGPU if that's your goal; those are gas guzzlers.
1. I would bet at least half of all that wattage is the SSDs.
2. Buddy, you're spewing BS at someone who used to run a Haswell in a really small Mini-ITX case. It was a fine HTPC back in 2014. But now everything, bar my dead Pi, is some kind of Ryzen. All desktops and laptops. The various 4800u/5800u/6800u and lower parts offer tremendous performance at 15W nominal power levels. The 5800H I am writing this message on is hardly a guzzler, especially when compared to Intel's core11/12 parts.
This random drive-by intel shilling really took me by surprise.
If someone is trying to find a pi you can try the telegram bot I made for rpilocator.com. It will notify you as soon as there is stock with filters for specific pis and your location/preferd vendor.
Would buy this in an instant if it weren't hobbled as hell by the onboard realtek switch. If it had an upstream 2.5/5/10g port it would be instantly 6 times more capable.
Would 6 Pis be able to handle more than 1g? It says that they got around 70MB write and 100MB read. 2.5/5/10 seems like it would be a waste unless I'm overlooking something
The AXI bus internal to the Pi's SoC is only capable of about 4gbps, and it carries DMA, so ~2gbps is more or less the hard limit for any kind of combined IO operation like disk<=>network no matter what kind of hardware you use for disk and network.
So yes, each pi can easily saturate its own 1gbps interface, so a system like ceph that parallelizes reads and writes among nodes is severely crippled by the onboard switch choking off bandwidth to external clients. For the same reason, you can't easily scale this platform beyond a single board, which puts your clustered system back into a single point of failure.
> Many people will say "just buy one PC and run VMs on it!", but to that, I say "phooey."
I mean with VM-leaking things like Spectre (not sure how much similar things affect ARM tbh) having physical barriers between your CPUs can be seen as a positive thing.
Sure, it's just that the Raspberry Pi isn't really fast enough for most production workloads. Having a cluster of them doesn't really help, you'd still be better off with a single PC.
As a learning tool, having the ability to build a real hardware cluster, in a MiniITX case is awesome. I do sort of wonder what the business case for these boards are, I mean are there actually enough people who want to do something like this... schools maybe? I still think it's beyond weird that that there are so much hardware available for build Pi clusters, but I can't get an ARM desktop motherboard, with a PCI slot capable of actually being used as a desktop, for a reasonable prices.
I think a lot of these types of boards are built with the business case of either "edge/IoT" (which still for some reason causes people to toss money at them since they're hot buzzwords... just need 5G too for the trifecta), or for deploying many ARM cores/discrete ARM64 computers in a space/energy-efficient manner. Some places need little ARM build farms, and that's where I've seen the most non-hobbyist interest in the CM4 blade, Turing Pi 2, and this board.
The future of cloud is Zero Isolation... With all the mitigation slowing it down, and the current energy prices and rising, having super-small nodes that are always reserved to one task seems interesting.
Unless you are constrained in space to a single ITX case as in this example, you can get whole x86 machines for <$100 with RAM and storage included.
There is a lot of choice in the <$150 range. You could get eight of these and a cheap 10-port switch for any kind of clustering lab you want to set up.
Adafruit had some in stock a few minutes ago: https://twitter.com/rpilocator ... I think every Wednesday around 11am ... I almost got one this time, but because they had me setup 2FA I couldn't checkout on time.
My setup is individual nodes, with 2.5" external HDDs (mostly SMR), so I actually get sligtly better performance than this cluster, and I'm using 4+2 erasure coding for the main data pool for CephFS.
CephFS has so far been incredibly stable and all my Linux laptops reconnect to it after sleep with no issues (in this regard it's better than NFS).
I like this setup a lot better now than ZFS, and I'm slowly starting to migrate away from ZFS, and now I'm even thinking of setting up a second Ceph cluster. The best thing with Ceph is that I can do a maintenance on a node at any time and storage availability is never affected, with ZFS I've always dreaded any kind of upgrade, and any reboot requires an outage. Plus with Ceph I can add just one disk at a time to the cluster and disks don't have to be the same size. Also, I can move the physical nodes individually to a different part of my home, change switches and network cabling without an outage now. It's a nice feeling.
Ceph (by default) does metadata scrubs every 24 hours, and data scrubs (deep-scrub) weekly (configurable, and you can manually scrub individual PGs at any time if that's your thing). I believe the default checksum used is "crc32c", and it's configurable, but I've not played with changing it. At work we get scrub errors on average maybe weekly now, at home I've not had a scrub error yet on this cluster in the past year (I did have a drive that failed and still needs to be replaced).
My RPi setup certainly does not have ECC RAM as far as I'm aware, but neither does my current ZFS setup (also a 6 drive RAIDZ2).
Nothing stopping you from running Ceph on boxes with ECC RAM, we certainly do that at my job.
My single ZFS box does that with ease, 3x mirrored vdevs = 6 disks total, but I'm curious as the flexibility of Ceph sounds tempting.
4 nodes, each node with 2x SAS SSDs, dual 25Gb NICs (one for front-end, one for back-end replication). The test pool is 3x replicated with Snappy compression enabled.
On a separate client (also with 25Gb) I mounded an RBD image with krbd and ran FIO:
I get a consistent 1.4 GiB/s:For erasure coded setups which is not really suited to block storage but mainly object storage via radosgw(s3) or cephfs you need minimum k+m and realistically k+m+1. That would translate to 6 minimum but realistically 7 nodes for k=4,m=2. That’s 4 data chunks and 2 redundant chunks which means you use 1.5x the storage of the raw data (half that of a replicated setup). You can do k=2,m=1 also. So 4 nodes into that case.
Note, a "node" or a failure domain, can be configured as a disk, an actual node (default), a TOR switch, a rack, a row, or even a datacenter. Ceph will spread the replicas across those failure domains for you.
At work, our bigger clusters can withstand a rack going down. Also, the more nodes you have, the less of an impact it is on the cluster when a node goes down, and the faster the recovery.
I started with 3 RPis then quickly expanded to 6, and the only reason I have 9 nodes now is because that's all I could find.
Deleted Comment
Power is provided either using official RPi power supplies, or a couple of multi-port Anker USB power supplies that I had previously. A limit of 2.5 amps does not seem to cause any issues.
Currently everything is connected to a single switch, but I move things around my office sometimes, and sometimes have the RPis connected to two different switches.
Right now, everything including the 1 switch is connected to a single APC UPS, and that thing is super old, so that's another SPOF.
My clients currently are a few wired desktops and laptops over wifi, all connecting via CephFS. I haven't tested with librbd or krbd, I imagine it wouldn't be fast.
The RPis are mostly 8GB, but I do have a couple 4GB, and one RPi 400, which is kind of hilarious.
Everything is running Ubuntu 20.04, Ceph Pacific, and deployed from the first node with cephadm.
I use only Samsung microSD cards, either 32GB or 64GB. I don't think it matters what kind, but getting bigger cards makes me feel like they'll last longer. Most of the nodes have the var partition on the external drives (on a small partition at the beginning of the drive), but I do have a few where I didn't set that up early on, and haven't gotten around to redoing it.
I partition the drives and put LVM on manually, and tell cephadm to use the specific LV instead of the bare drive.
If you want any kind of performance, definitely set your expectations very low, but for me this works. I can stream at least a pair of 4K movies off this simultaneously, and I also run an instance of Paperless-NG off this over a CephFS mount and haven't had any issues.
The next was via Rook, since I have a Kubernetes cluster, and it was a nightmare. I spent a week or so reading through all the docs I could find before I felt prepared to go through with it, only to have random clock sync issues that Reddit informed me were due to me enabling power savings mode in the BIOS for my nodes.
ZFS's biggest hiccup for me is when I do a kernel update and DKMS borks the upgrade. Other than that, it's been rock-solid. I run a normal and backup node with it, no regrets.
Since I'm familiar with RHEL I just swapped some of the Debian default services for RHELish alternatives (Firewalld, Podman, etc.).
I'd like to note that ZFS now has RAID-Z expansion which allows us to do exactly that! It's an essential feature for home users since it allows us to gradually expand capacity instead of buying up all the storage up front at great cost.
I too researched ceph for this exact reason but was told the hardware requirements were too high for a typical home lab, yet you're running ceph on raspberry pis... I should probably look into ceph once more.
Since this setup has run incredible smooth so far, I plan on using SolidRun's HoneyComb LX2 as a ceph node with bigger disks and nvme write cache in the future. I looked at the raspberry pi 4, but was not too impressed by the single PCIe 3.0 lane, since I also plan on using NVME disks as ceph's metadata storage device to speed up the hard disk with the normal data behind it and the ceph recommendation to use 10GbE NICs.
The HoneyComb LX2 has 4 built in 10GbE ports, 16 A72 cores, actual DDR4 RAM slots, a 4 lane PCIe 3.0 m.2 slot and an open-ended (so you can put in a full x16 device) PCIe 3.0 slot with 8 lanes for a max of 8Gbyte/s bandwidth.
Since it's an arm box it's incredible energy efficient which is important since energy prices are increasing in my country. Also its the only affordable performant arm device at 800USD.
We had a huge Rook/Ceph installation in the early days of our startup before we killed off the product that used it (sadly). It did explode under some rare unusual cases, but I sometimes miss it! For folks who aren't aware, a rough TLDR is that Ceph is to ZFS/LVM what Kubernetes is to containers.
This seems like a very cool board for a Ceph lab - although - extremely expensive - and I say that as someone who sells very expensive Raspberry Pi based computers!
So I think a big part of Ceph's success (at least IMO) was its timing, and it's adoption into a more cloud-first ecosystem. That narrowed the use cases down from what the earliest networked storage software were trying to solve.
Ceph is where the action is now.
I currently run a single-node SnapRAID setup, but would like to expand to a distributed one, and would ideally prefer something simple (which is why I chose SnapRAID over ZFS). Ceph feels to enterprisey and complex for my needs, but at the same time, I wouldn't want to entrust my data to a simpler project that can have major issues I only discover years down the road.
SeaweedFS has an interesting comparison[1], but I'm not sure how biased it is.
[1]: https://github.com/seaweedfs/seaweedfs#compared-to-ceph
MinIO is changing so much in the last years thatI can't give a competent answer but compared to seaweedfs it uses many small local databases. Right now it's deprecating many features like the gateway and it is split into 2 main components (cli and server) compared the seaweedfs deployment is dead simple, but I don't know which direction the project is going. Went from a normal open source project to a more business like deal (in what I saw) like I said I didn't quite follow the process.
Ceph is based on blocmlk storage. Offers an object gateway (s3/swift), fs (cephfs) and block storage (rbd). You can access everything through librados directly as well. For a minimal setup you need a "larger" vluster but it is the most flexible solution (imho). Uses the most resources as well, but you can do nearly everything you want without limit with it.
I personally learned about Ceph from a coworker and fellow distributed systems geek who's a big fan of the design. So I kind of absorbed a lot of the concepts before I ever actually started using it. There have been quite a few times where I look at a command or config parameter, and think, "oh, I know what that's probably doing under the hood"... but when I try to actually check that assumption, the documentation is missing, or sparse, or outdated, or I have to "read between the lines" of a bunch of different pages to understand what's really happening.
I've run Ceph at two Fortune 50 companies since 2013 to now, and I've not lost a single production object. We've had outages, yes, but not because of Ceph, it was always something else causing cascading issues.
Today I have a few dozen clusters with over 250 PB total storage, some on hardware with spinning rust that's over 5 years old, and I sleep very well at night. I've been doing storage for a long time, and no other system, open source or enterprise, has given me such a feeling of security in knowing my data is safe.
Any time I read about a big Ceph outage, it's always a bunch of things that should have never been allowed in production, compounded by non-existent monitoring, and poor understanding of how Ceph works.
The 2 issues with this setup were that the HC2 was using a low-performance armv7 processor, with armv7 being a very unsupported platform by most software and the fact that you can't use a flash-based disk for a ceph bluestore metadata device since it only had one SATA port.
Now, if there were a new Pi to buy. Isn't it time for the 5? It's been 3 years for most of which they've been hard to fine. Mine broke and I really miss it because having a full blown desktop doing little things makes no sense, especially during the summer.
2. Buddy, you're spewing BS at someone who used to run a Haswell in a really small Mini-ITX case. It was a fine HTPC back in 2014. But now everything, bar my dead Pi, is some kind of Ryzen. All desktops and laptops. The various 4800u/5800u/6800u and lower parts offer tremendous performance at 15W nominal power levels. The 5800H I am writing this message on is hardly a guzzler, especially when compared to Intel's core11/12 parts.
This random drive-by intel shilling really took me by surprise.
The bot is here: https://t.me/rpilocating_bot
source: https://github.com/sschueller/rpilocatorbot
So yes, each pi can easily saturate its own 1gbps interface, so a system like ceph that parallelizes reads and writes among nodes is severely crippled by the onboard switch choking off bandwidth to external clients. For the same reason, you can't easily scale this platform beyond a single board, which puts your clustered system back into a single point of failure.
Strange, but terrible design decision.
I mean with VM-leaking things like Spectre (not sure how much similar things affect ARM tbh) having physical barriers between your CPUs can be seen as a positive thing.
As a learning tool, having the ability to build a real hardware cluster, in a MiniITX case is awesome. I do sort of wonder what the business case for these boards are, I mean are there actually enough people who want to do something like this... schools maybe? I still think it's beyond weird that that there are so much hardware available for build Pi clusters, but I can't get an ARM desktop motherboard, with a PCI slot capable of actually being used as a desktop, for a reasonable prices.
There is a lot of choice in the <$150 range. You could get eight of these and a cheap 10-port switch for any kind of clustering lab you want to set up.
Here is an example: https://www.aliexpress.com/item/3256804328705784.html?spm=a2...
These are thin clients but flip an option in the bios and it's a regular pc.
https://pine64.com/product/pine-a64-lts/
What happened? I remember buying a Pi 3B+ in 2019 for less than 50€.