Readit News logoReadit News
0xbadcafebee · 13 days ago
Here's 12 Sysadmin/DevOps (they're synonyms now!) challenges, straight from the day job:

  1.  Get a user to stop logging in as root.
  2.  Get all users to stop sharing the same login and password for all servers.
  3.  Get a user to upgrade their app's dependencies to versions newer than 2010.
  4.  Get a user to use configuration management rather than scp'ing config files from their laptop to the server.
  5.  Get a user to bake immutable images w/configuration rather than using configuration management.
  6.  Get a user to switch from Jenkins to GitHub Actions.
  7.  Get a user to stop keeping one file with all production secrets in S3, and use a secrets vault instead.
  8.  Convince a user (and management) you need to buy new servers, because although "we haven't had one go down in years", every one has faulty power supply, hard drive, network card, RAM, etc, and the hardware's so old you can't find spare parts.
  9.  Get management to give you the authority to force users to rotate their AWS access keys which are 8 years old.
  10. Get a user to stop using the aws root account's access keys for their application.
  11. Get a user to build their application in a container.
  12. Get a user to deploy their application without you.
After you complete each one, you get a glass of scotch. Happy Holidays!

cobertos · 13 days ago
Re: 6. ... Github Actions

Github Actions left a bad taste in my mouth after having it randomly removed authenticated workers from the pool, after their offline for ~5 days.

This was after setting up a relatively complex PR workflow (always on cheap server starts up very expensive build server with specific hardware) only to have it break randomly after a PR didn't come in for a few days. And no indication that this happens, and no workaround from GitHub.

There are better solutions for CI, GitHub 's is half baked.

paulddraper · 12 days ago
This is documented currently (supposed to be 14 days). [1]

That said, I have found runners to be unnecessarily difficult.

But Jenkins and its own quirks, and when I used GitLab, it used ancient docker-machine and outdated AMIs by default.

I think Buildkite has been the only one to make this easy and scalable. But it is meant for self hosted runners.

[1] https://docs.github.com/en/enterprise-cloud@latest/actions/h...

swyx · 13 days ago
bugs happen to all of us. whats your better solution - gitlab?
jagged-chisel · 13 days ago
> … from Jenkins to GitHub Actions.

Oh, good lord why?

0xbadcafebee · 12 days ago
Many, many reasons... the most important of which is, Jenkins is a constant security nightmare and a maintenance headache. But also it's much harder to manage a bunch of random Jenkins servers than GHA. Authentication, authorization, access control, configuration, job execution, networking, etc. Then there's the configuration of things like env vars and secrets, environments, etc that can also scale better. I agree GHA kinda sucks as a user tool, but as a sysadmin Jenkins will suck the life out of you and sap your time and energy that can go towards more important [to the company] tasks.
vachina · 13 days ago
Because sysadmim wants to outsource their responsibilities (and job).
n4bz0r · 13 days ago
> Sysadmin/DevOps (they're synonyms now!)

I've notified the authorities and social services.

betaby · 13 days ago
5. and 6. are a matter of taste (trade-offs), the rest is spot on!
daemonologist · 13 days ago
You get me the permissions to do half of this stuff, and I'll do whatever you want.

Deleted Comment

Waterluvian · 12 days ago
Here’s the first step to all of these that I often see sysadmins stumbling on: communicate in written, non-abstract terms why each of these matter.

Most are obvious to most people. None are obvious to everybody.

Nextgrid · 12 days ago
> Get a user to stop logging in as root.

It really depends if the machine is hosting anything that you don't want some users to access. If the machine is single-purpose and any user is already able to access everything valuable from it (DB with customer data, etc) or trivially elevate to root (via sudo, docker access, etc) then it's just pointless extra typing and security theatre.

panzagl · 12 days ago
I guess no one ever audits your servers.
f1shy · 12 days ago
>> Sysadmin/DevOps (they're synonyms now!)

Is this really like that? Isn't there any Unix/DBA anymore? I associate DevOps to what at my time we called "operations" and "development". We had 5 teams or so:

1) Developers, who would architect and write code, 2) Operations who would deploy, monitor and address customer complaints, 3) Unix (aka SYS) administrators, who would take care of housekeeping of well, the OS (and web servers/middleware), 4) DBA who would be monitoring and optimizing Oracle/Postgres, and 5) Network admins, who would take care of Load Balancers, Routers, Switches, Firewalls (well, there were 2 security experts for that also)

So I think DevOps would be a mix of 1&2, to avoid the daily wars that would constantly happen "THEY did it wrong!"

Can somebody clear my mind, please!? It seems I was out of it for too long?!

Wilya · 12 days ago
In full-cloud environments, in small/middle companies I've worked at:

Developers handle 1). Devops handle 2)/3)/5). Nobody does 4)

rtp4me · 12 days ago
For 4) - consider PGHero[1] and PGTuner[2] instead of a full-time DBA. We use both in production and they work very well to help track down performance issues with Postgres.

[1] https://github.com/ankane/pghero

[2] https://pgtune.leopard.in.ua/

Edit: For the record, I have worked at a few small companies as the "SysAdmin" guy who did the whole compliment of servers, OS, storage, networking, VMs, DB, perf tuning, etc.

technion · 13 days ago
I know its a common view that sysadmin/devops are the same these days, but witha current sysadmin role nothing youve mentioned sounds relevant. Let's give you my list:

1. Patch Microsoft exchange with only a three hour outage window 2. Train a user to use onedrive instead of emailing 50mb files and back and forth 3. Setup eight printers for six users. Deal with 9gb printer drivers. 4. Ask an exec if he would please let you add mfa to their mailbox. 5. Sit there calmly while that exec yells like a wwe wrestler about the ways he plans to ruin you in response 6. Debate the cost of a custom mouse pad for one person across three meetings 7. Deploy any standard windows app that expects everyone be an administrator without making everyone an administrator 8. Deploy an app that expects uac disabled without disabling uac 9. Debug some finance persons 9000 line excel function

hnlmorg · 13 days ago
That sounds more like Desktop Support than a SysAdmin role. My condolences if that's the job you landed when interviewing for a SysAdmin role
0xbadcafebee · 12 days ago
I used to have that job, but my title wasn't Sysadmin, it was IT Manager. For companies small enough that they don't have multiple roles, you do both... but for larger companies, the user-side stuff is done by IT, and the server-side stuff is done by a Sysadmin. (And my condolences; having done that combined role, it's not easy, and you don't get paid enough!)
hansmayer · 13 days ago
What you describe sounds more like a MS "Modern Workplace" / IT support in a corporate environment.
dessimus · 10 days ago
>4. Ask an exec if he would please let you add mfa to their mailbox.

Ask?! This is where the org's cyber insurance is your friend. Just have the executive get the provider's clearance on him not having MFA. I'm sure that line item will change his mind, and if not, be sure to accidently mention those exemptions to those yearly auditors.

stackskipton · 12 days ago
Former Exchange Admin here: 1 is easy, I used to do 70k mailboxes in middle of the day only but it requires spare hardware or virtualization with headroom.

Deploy new Server(s), patch, install Exchange, Setup DAGs, migrate everyone mailbox, swing load balancer over to new servers, uninstall Exchange from old, remove old from Active Directory, delete servers.

BTW, Upgrades now suck because Office365 uses method above so upgrade system never gets good Q&A from them.

alberth · 13 days ago
I’d be super interested to see solutions to each, just to learn from.
philipwhiuk · 12 days ago
You can deploy tooling (e.g. BeyondTrust / CyberArk for 1&2), but ultimately there's a conversation and a migration plan to be done for each.
athrowaway3z · 13 days ago

  9.  Get management to give you the authority to force users to rotate their AWS access keys which are 8 years old.

Saying "keys which are 8 years old" implies you're worried about the keys themselves, which is just wrong. (Their security state depends on monitoring)

You can definitely make a strong argument that the organization needs practice rotating, so I would advise reframing it as an org-survivability-planning challenge and not a key-security issue.

DoctorOW · 12 days ago
> Get a user to use configuration management rather than scp'ing config files from their laptop to the server.

Damn, this one I'm guilty of. Though, I'm not real Sysadmin/DevOps, I'm just throwing something together and deploying it on a LAN-only VM for security reasons (I don't trust the type of code I would write)

infogulch · 12 days ago
Q: 3. Get a user to upgrade their app's dependencies to versions newer than 2010.

A: Calculate the average age in years of all dependencies calculated by: (max(most recent version release date, date of most recent CVE on library) - used version release date). Sleep for that many seconds before the app starts.

JuniperMesos · 13 days ago
A lot of these problems seem pretty solveable, if you're the admin of the machine (or cloud system) and the user isn't.

If you don't want a user to log in as root, disable the root password (or change it to something only you know) and disable root ssh. If you want people to stop sharing the same login and password across all servers, there's several ways to do it but the most straightforward one seems like it would be to enforce the use of a hardware key (yubikey or similar) for login. If people aren't using configuration management software and are leaving machines in an inconsistent state, again there are several options but I'd look into this NixOS project: https://github.com/nix-community/impermanence + some policy of rebooting the machines regularly.

If you don't like how users are making use of AWS resources and secrets, then set up AWS permissions to force them to do so the correct way. In general if someone is using a system in a bad or insecure way, then after alerting them with some lead time, deliberately break their workflow and force them to come to you in order to make progress. If the thing you suggest is actually the correct course of action for your organization, then it will be worthwhile.

philipwhiuk · 12 days ago
None of them are technically hard. All of them are bureaucracy-hard.

If you just do any of this list without the proper migration plan/time, someone senior in the org will complain and you will lose.

skywhopper · 12 days ago
It’s not as easy as “I can technically change this”. If you think it is, you don’t understand the job of a sysadmin.
AstroJetson · 12 days ago
I think the BOFH answer would be “They ride Elevator #2 to sub-basement 3.” Plot twist, there is only sub-basement 2.

Two pints of ale please!

UltraSane · 12 days ago
Best practice is to use IP-restricted keys.

Deleted Comment

melvinodsa · 13 days ago
When I get sad and nothing to do in the world, may be hacking into a sad server's problem seems very interesting
alexpotato · 12 days ago
We use Sad Servers for evaluating candidates for DevOps/SRE roles and it's phenomenal.

Feedback from candidates is that they find it a bit stressful during the actual interview but love the approach once it's completed.

The interview option also makes it trivial to just send to a candidate via Zoom chat, ask them to share their screen and "just works".

Happy to answer questions folks may have about how we use it.

zenoprax · 12 days ago
This is heartening - I'm about to start with the daily challenges today and document my experience and that sort of thing.

Any other suggestions? I have sysadmin experience as a homelabber and at work with a small company as a "tech lead" but have not yet had the chance to do it full time in a larger company. Currently focused on back-filling knowledge gaps and adding certs to support my existing experience.

alexpotato · 12 days ago
Sad Servers is great for trying out how to fix scenarios that you would probably run into while working in the real world.

If you are looking into more of the "people" side of things, I would HIGHLY recommend Never Split the Difference by Chris Voss [0]. A big part of being a team lead and/or working at a larger firm is understanding where people are coming from and then convincing them that your solution is "win/win". The book is great at highlighting multiple different tactics to do that.

Turn the Ship Around [1] is also great at giving examples of how to "change organizations in place". If you end up at larger firms, there will be a LOT of legacy infra and processes that you may want to improve. Marquet gives excellent examples of how to change things WHILE ALSO getting buy in from the team.

0 - https://amzn.to/48dBSn2

1 - https://amzn.to/4pfL2Wb

kralos · 13 days ago

    imagine typing in a terminal...
    you want to delete the previous word so press ctrl+w...
    actually you're in a browser; the window closes...
:sadness:

melvinodsa · 13 days ago
We used to run terminal in browser using https://github.com/yudai/gotty and the entire dev team remapped their Ctrl+w to Ctrl+`. We did frontend and backend development with this setup almost for 1.5 years. Muscles memory and till this date, always have the fear if my actual terminal will get closed if I use Ctlr+w :P
tambourine_man · 12 days ago
Which is why macOS command key is such an undervalued nicety. One key for GUI stuff, one for command-line stuff.
protomikron · 13 days ago
You can use ctrl+shift+t to open the recently closed tab again.
fduran · 13 days ago
hello, creator here, sorry about that. In this case you can click again on the "Open the Server Terminal in a New Window" button
kralos · 13 days ago
It would be cool if we could SSH into the temporary host (I'm guessing these hosts currently aren't internet connected to avoid abuse so might not be possible or require some super careful firewalling)
CoolCold · 13 days ago
I feel your pain - bites me from time to time, especially in KVM ;)
scubbo · 11 days ago
Maybe I'm just extremely dumb, but I can't find how to edit files? Neither `vi` nor `nano` are installed, I don't have internet access to `apt-get update`, and I'm not about to learn `emacs` for this...

EDIT: Ah, ok, `vi` is installed on the server _itself_, just not in the Docker containers. So I guess I'm going to have to `docker cp` them in. Can do o7

Erwyn · 12 days ago
Cool, might try it out! Are there any solutions repositories for them. I’d love to get an explanation for the ones I’m about to fail.
gautamsomani · 10 days ago
Personal advice: don't use solutions repo. Googling the problem and then digging deep into the solutions will teach you hell lot more. Read the man pages of commands that turn up on Google, try them with different options, try to find different commands which can do almost the same thing may be a bit differently .... all these will help you learn things lot more.
irusensei · 13 days ago
It seems it's called SRE nowadays right? I hate how things keep being renamed for no reason other than making more buzzwords for suits.
phrotoma · 12 days ago
The definition I liked best, which I _think_ came from one of the Google SRE books though I'm not certain, was: "SRE is what happens when you consider operations to be a software problem".
oarmstrong · 12 days ago
I share your disdain for buzzwords but SRE is definitely a different role.
kortilla · 12 days ago
Nope, SREs keep applications running on a platform. Lots of metrics, tools to deploy apps in whatever rollout process the company has, etc.

In small companies, sysadmin might be a duty of the SRE team, but they definitely diverge if you have a large on-prem deployment or work with bespoke VMs in the cloud.

teddyh · 13 days ago
[flagged]
thatxliner · 13 days ago
well advent of code also needs an account
npinsker · 13 days ago
It’s not necessary to see the problems though
stonecharioteer · 13 days ago
This also has a paid account and a business account.
fduran · 13 days ago
Checking out how the platform works was two clicks away: home -> give me a server.

I don't know of any other SaaS which gives you a VM with one click without any registration but we do it.

In any case thanks for the feedback, I've put a button on this /advent page for clarity, cheers

teddyh · 9 days ago
This text:

> Sign up for a free account (needed to keep track of your progress)

is a complete lie. Tracking a person’s progress is what cookies are for. You don’t need us to create an account for that.

What you do need users to create accounts for, is for you to track every user and their progress.

Deleted Comment

fragmede · 13 days ago
how do you want it to work? do you even sysadmin?
jbmsf · 13 days ago
I see: a page offering something interesting but vague.

If you tell me more, I might sign up. If I have to create an account first, I'm walking away.

teddyh · 13 days ago
> how do you want it to work?

I would like to see and try to solve the scenarios for myself, not to get meaningless internet points. If you look at their front page, you can do that right now. So why do I have to create an account to even see these special advent scenarios?

> do you even sysadmin?

Yes.

mekoka · 13 days ago
I think the point is "ok, account is free, then what?"

At 5$/m I might give the paid subscription a try.