It’s on APNews too - it’s real
It’s on APNews too - it’s real
Hardware RAID just works, and for many, that’s good enough. In more advanced systems, all its got to handle is a boot partition, and if you’re doing your job as a sysadmin there’s zero important data in there that can’t be easily rebuilt or restored.
I never said I didn’t use software RAID, I just wanted to add information about hardware RAID controllers. Maybe I’m blind, but I’ve never seen a good implementation of software RAID for the EFI partition or boot sector. During boot, most systems I’ve seen will try to always access one partition directly and a second in order, which is bypassing the concept of a RAID, so the two would need to be kept manually in sync during updates.
Because of that, there’s one notable place where I won’t - I always use hardware RAID for at minimum the boot disk because Dell firmware natively understands everything about it from a detect/boot/replace perspective. Or doesn’t see anything at all in a good way. All four of my primary servers have a boot disk on either a Startech RAID card similar to a Dell BOSS or have an array to boot off of directly on the PERC. It’s only enough space to store the core OS.
Other than that, at home all my other physical devices are hypervisors (VMware ESXi for now until I can plot a migration), dedicated appliance devices (Synology DSM uses mdadm), or don’t have a redundant disks (my firewall - backed up to git, and my NUC Proxmox box, both firewalls and the PVE are all running ZFS for features).
Three of my four ESXi servers run vSAN, which is like Ceph and replaces RAID. Like Ceph and ZFS, it requires using an HBA or passthrough disks for full performance. The last one is my standalone server. Notably, ESXi does not support any software RAID natively that isn’t vSAN, so both of the standalone server’s arrays are hardware RAID.
When it comes time to replace that Synology it’s going to be on TrueNAS
For recovering hardware RAID: most guaranteed success is going to be a compatible controller with a similar enough firmware version. You might be able to find software that can stitch images back together, but that’s a long shot and requires a ton of disk space (which you might not have if it’s your biggest server)
I’ve used dozens of LSI-based RAID controllers in Dell servers (of both PERC and LSI name brand) for both work and homelab, and they usually recover the old array to the new controller pretty well, and also generally have a much lower failure rate than the drives themselves (I find myself replacing the cache battery more often than the controller itself)
Only twice out of the handful of times I went to a RAID controller from a different generation
As others have pointed out, this is where backups come into play. If you have to replace the server with one from a different generation, you run the risk that the drives won’t import. At that point, you’d have to sanitize the super block of the array and re-initialize it as a new array, then restore from backup. Now, the array might be just fine and you never notice a difference (like my users that had to replace a failed R815 with an 820), but the result pattern is really to the extremes of work or fault with no in between.
Standalone RAID controllers are usually pretty resilient and fail less often than disks, but they are very much NOT infallible as you are correct to assess. The advantage to software systems like mdadm, ZFS, and Ceph is that it removed the precise hardware compatibility requirements, but by no means does it remove the software compatible requirements - you’ll still have to do your research and make sure the new version is compatible with the old format, or make sure it’s the same version.
All that’s said, I don’t trust embedded motherboard RAIDs to the same degree that I trust standalone controllers. A friend of mine about 8-10 years ago ran a RAID-0 on a laptop that got it’s super block borked when we tried to firmware update the SSDs - stopped detecting the array at all. We did manage to recover data, but it needed multiple times the raw amount of storage to do so.
Full extension rails are probably best going to come from the original vendor as a general principle, rather than attempting to use universal rails.
If you have a wall mounted rack, unless your walls are not drywall, physics is working against you. It’s already a pretty intense heavy cantilever, and putting a server in there that can extend past the front edge is only going to make that worse.
If you want to use full extension rails, you should get a rack that can sit squarely on the floor on either feet or appropriately rated casters. You should also make sure your heaviest items are on the bottom ESPECIALLY if you have full extension rails - it will make the rack less likely to overbalance itself and tip over when the server is extended.
I’m not entirely sure what you’re seeking to accomplish here - are you looking to just impose authorization on a subset of the images? Probably those should be in a non-public bucket for starters.
Looking to only give certain people access to files and also have a nicer UI (a la Google Drive / Photos)? Maybe plain S3 isn’t the play here and a dedicated application is needed for that subset.
Pre signed URLs may also be a thing useful to what you’re trying to to solve. https://docs.min.io/docs/javascript-client-api-reference.html#presignedGetObject
Other have all mentioned various tech to help you out with this - Ceph, ZFS, RAID at 50/60, raid is not a backup, etc.
40 drives is a massive amount. On my system, I have ~58TB (before filesystem overhead) comprised of a 48TB NAS (5x12TB@RAID-5) 42TB of USB backup disks for said NAS (RAID is not a backup), a 3-node vSAN array with 12TB (3x500GB cache, 6x2TB capacity) of all-flash storage at RF2 (so ~6TB usable, since each VM is in independent RAID-1), and a standalone host with ~4TB@RAID-5 (16 disks spread across 2 RAID-5 arrays, don’t have the numbers off hand)
That’s 5+9+16=30 drives, and the whole rack takes up 950w including the switches, which iirc account for ~250-300w (I need to upgrade those to non-PoE versions to save on some juice). Each server on its own takes up 112-185w, as measured at iDRAC. It used to take up 1100w until I upgraded some of my older servers into newer ones with better power efficiency as my own build-out design principle.
While you can just throw 40-60 drives in a 4u chassis (both Dell and 45drives/Storinator offer the ability to have this as a DAS or server), that thing will be GIGA heavy fully loaded. Make sure you have enough power (my rack has a dedicated circuit) and that you place the rack on a stable floor surface capable of withstanding hundreds of pounds on four wheels (I think I estimated my rack to be in the 300-500lbs class)
You mentioned wanting to watch videos for knowledge - if you want anywhere to start, I’d like you to start by watching the series Linus Tech Tips did on their Petabyte Project’s many iterations as a case study for understanding what can go wrong when you have that many drives. Then look into the tech you can use to not make the same mistakes Linus did. Many very very good options are discussed in the other comments here, and I’ve already rambled on far too long.
Other than that, I wish you the best of luck on your NAS journey, friend. Running this stuff can be fun and challenging, but in the end you have a sweet system if you pull it off. :) There’s just a few hurdles to cross since at the 140TB size, you’re basically in enterprise storage land with all the fun problems that come with scale up/out. May your storage be plentiful and your drives stay spinning.
As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.
Enterprise backups tend to fall along the recommendation called 3-2-1:
On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn’t have an off-site, but I do have two external hard drives connected to my NAS.
Story time
I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.
The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That’s how much you might be willing to spend on keeping that data safe.
What platform?
Another user said it - what your asking for isn’t a backup, it’s just data transfer.
It sounds like you’re looking for a storage backend that hosts all your data and can download data to the client side on the fly.
If your use case is Windows, Nextcloud Desktop may be what you looking for. I have a similar setup with the game clips folder. It detects changes and auto uploads then, while deleting less recently used data that’s properly server side. This feature might be in Mac but I haven’t tested it.
Backup wise, I capture an rsync of the nextcloud database and filesystem server-side and store it on a different chassis. That then gets backed up again to a USB drive I can grab and run.
Nextcloud also supports external storage, which the server directly connects to: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html
I don’t have an immediate answer for you on encryption. I know most of the communication is encrypted in flight for AD, and on disk passwords are stored hashed unless the “use reversible encryption field is checked”. There are (in Microsoft terms) gMSAs (group-managed service accounts) but other than using one for ADFS (their oath provider), I have little knowledge of how it actually works on the inside.
AD also provides encryption key backup services for Bitlocker (MS full-partition encryption for NTFS) and the local account manager I mentioned, LAPS. Recovering those keys requires either a global admin account or specific permission delegation. On disk, I know MS has an encryption provider that works with the TPM, but I don’t have any data about whether that system is used (or where the decryptor is located) for these accounts types with recoverable credentials.
I did read a story recently about a cyber security firm working with an org who had gotten their way all the way down to domain admin, but needed a biometric unlocked Bitwarden to pop the final backup server to “own” the org. They indicated that there was native windows encryption going on, and managed to break in using a now-patched vulnerability in Bitwarden to recover a decryption key achievable by resetting the domain admin’s password and doing some windows magic. On my DC at home, all I know is it doesn’t need my password to reboot so there’s credentials recovery somewhere.
Directly to your question about short term use passwords: I’m not sure there’s a way to do it out of the box in MS AD without getting into some overcomplicated process. Accounts themselves can have per-OU password expiration policies that are nanosecond accurate (I know because I once accidentally set a password policy to 365 nanoseconds instead of a year), and you can even set whole account expiry (which would prevent the user from unlocking their expired password with a changed one). Theoretically, you could design/find a system that interacts with your domain to set, impound/encrypt, and manage the account and password expiration of a given set of users, but that would likely be add on software.
Yes I do - MS AD DC
I don’t have a ton of users, but I have a ton of computers. AD keeps them in sync. Plus I can point services like gitea and vCenter at it for even more. Guacamole highly benefits from this arrangement since I can set the password to match the AD password, and all users on all devices subsequently auto-login, even after a password change.
Used to run single domain controller, now I have two (leftover free forever licenses from college). I plan to upgrade them as a tick/tock so I’m not spending a fortune on licensing frequently
With native Windows clients and I believe sssd realmd joins, the default config is to cache the last hash you used to log in. So if you log in regularly to a server it should have an up to date cache should your DC cluster become unavailable. This feature is also used on corporate laptops that need to roam from the building without an always-on VPN. Enterprises will generally also ensure a backup local account is set up (and optionally auto-rotated) in case the domain becomes unavailable in a bad way so that IT can recover your computer.
I used to run in homemade a Free IPA and a MS AD in a cross forest trust when I started ~5-6y ago on the directory stuff. Windows and Mac were joined to AD, Linux was joined to IPA. (I tried to join Mac to IPA but there was only a limited LDAP connector and AD was more painless and less maintenance). One user to rule them all still. IPA has loads of great features - I especially enjoyed setting my shell, sudoers rules, and ssh keys from the directory to be available everywhere instantly.
But, I had some reliability problems (which may be resolved, I have not followed up) with the update system of IPA at the time, so I ended up burning it down and rejoining all the Linux servers to AD. Since then, the only feature I’ve lost is centralized sudo and ssh keys (shell can be set in AD if you’re clever). sssd handles six key MS group policies using libini, mapping them into relevant PAM policies so you even have some authorization that can be pushed from the DC like in Windows, with some relatively sane defaults.
I will warn - some MS group policies violate Linux INI spec (especially service definitions and firewall rules) can coredump libini, so you should put your Linux servers in a dedicated OU with their own group policies and limited settings in the default domain policy.
I’m probably the overkill case because I have AD+vC and a ton of VMs.
RPO 24H for main desktop and critical VMs like vCenter, domain controllers, DHCP, DNS, Unifi controller, etc.
Twice a week for laptops and remote desktop target VMs
Once a week for everything else.
Backups are kept: (may be plus or minus a bit)
The software I have (Synology Active Backup) captures data using incremental backups where possible, but if it loses its incremental marker (system restore in windows, change-block tracking in VMware, rsync for file servers), it will generate a full backup and deduplicate (iirc).
From the many times this has saved me from various bad things happening for various reasons, I want to say the RTO is about 2-6h for a VM to restore and 18 for a desktop to restore from the point at which I decide to go back to a backup.
Right now my main limitation is my poor quad core Synology is running a little hot on the CPU front, so some of those have farther apart RPOs than I’d like.
He already did step down as CEO of LMG ~2mo ago
Relevant XKCD
https://xkcd.com/386/
Can’t remember if the one I saw was movie or TV but I want to say the plot device in question (from the instance I remembered) was a small acoustic bug under the caps lock key. At the time I thought that was too far fetched to be possible. We live in a world…
What’s the difference between horizontal and vertical integration? (I know a few business words but usually not enough to be intelligent, this is a genuine question of confusion)