1EB with only 30k users, thats a wild TB-per-user ratio. My frame of reference; the largest storage platform I've ever worked on was a combined ~60PB (give or take) and that had hundreds of millions of users.
When experiments are running the sensors generate about 1PB of data per second. They have to do multiple (I think four?) layers of filtering, including hardware level to get to actual manageable numbers.
It depends on which experiment. We call it trigger system. And it varies according to each experiment requirements and physics of interest. For example LHCb is doing now full trigger system on a software side (No hardware FPGA triggering) and mainly utilizing GPUs for that. That would be hare to achieve with the harsher conditions and requirements of CMS and ATLAS.
But yes at LHCb we discard about 97% of the data generated during collisions.
>1EB with only 30k users, thats a wild TB-per-user ratio.
33TB per user is a lot, but is it really "wild"? I can fill up well over 33TB of pro-camera photos in less than a year if I shoot every day. I'm sure scientists can generate quite a bit more data if they're doing big things like CERN does.
Rucio enables centralized management of large volumes of data backed by many heterogeneous storage backends.
Data is physically distributed over a large number of storage servers, potentially each relying on different storage technologies (SSD/Disk/Tape/Object storage) and, frequently, managed by different teams of system administrators.
Rucio builds on top of this heterogeneous infrastructure and provides an interface which allows users to interact with the storage backends in a unified way. The smallest operational unit in Rucio is a file. Rucio enables users to upload, download, and declaratively manage groups of such files.
Tape and off-site replicas at globally distributed data centres for science.
Of the 1EB a huge amount of that is probably in automated recall and replication with "users" running staged processing of the data at different sites ultimately with data being reduced to "manageable" GB-TB level for scientists to do science
Yup, lots of tape for stuff in cold storage, and then some subset of that on disk spread out over several sites.
It's kinda interesting to watch anything by Alberto Pace, the head of storage at CERN to get an understanding of the challenges and constraints: https://www.youtube.com/watch?v=ym2am-FumXQ
I was basically on the helpdesk for the system for a few years so had to spend a fair amount of time helping people replicate data from one place to another, or from tape onto disk.
For experiment data, there is a layer on top of all of this that distributes datasets across the computing grid. That system has a way to handle replicate at the dataset level.
> over the years what discoveries have been made at CERN that have had practical social and economic benefits to humanity as a whole?
Some responders to the question believe I was criticizing a supposed wastefulness of the research. Not knowing the benefits of the discoveries in high energy physics, ie the stuff the accelerators are actually built to discover, doesn't mean I was criticizing it.
Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
So to rephrase the question - What are the practical social and economic benefits to society that the discoveries in high-energy particle physics at institutions like CERN have made over the years?
This is not just in relation to CERN, but world wide, such those experiments which create pools of water deep underground to study cosmic rays etc.
So, a large chunk of the benefits are more in the form of 'side-effects' rather than directly fueled by particle physics discoveries. This is kind of by definition, since the point of particle accelerators as powerful as the LHC is to replicate conditions that cause subatomic particles to fall apart. Same goes with things like neutrino detectors or gravitational wave detectors, they're all looking for things that are barely observable with those engineering marvels, we're a long way away from being able to exploit their discoveries economically.
One of the biggest and more 'direct' social and economic benefits (in the sense of being directly associated with high-energy particle physics) would be the development of synchrotron light sources, which are a lot more direct in their contribution to society. In typical particle accelerators, the emission of synchrotron light is a negative effect, but it turns out to be pretty valuable for material science. These facilities are usually so busy that they have to pick and choose the study proposals they accept. As an example, some of the early understanding of Covid-19's structure came from studies at synchrotrons. More recently there are startups which are attempting to use synchrotron tech to sterilize things.
Besides that it's all mainly indirect effects. A lot of the cost of building and updating these sorts of machines is towards developing improved sensors, cryonics, magnets, optics, control systems, networking systems etc. These all feed into other fields and other emerging technologies.
> Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
I was one of those responders.
There were two very deliberate reasons I specifically avoided talking about particle physics:
1. I interpreted the tone of the original question to be extremely highly cynical of any scientific contribution particle physics has made, so I instead went for 'consequential' things. More excitement around education, outreach, and other adjacent aspects that are beneficial to humans. I did this to potentially avoid, "How has discovering a new Boson made my rent cheaper?" types of arguments that are only ever made in bad faith, but have been made to me a disheartening number of times in my career; and
2. I am scientist and I have collaborators and colleagues at CERN, but I'm not a particle physicist and so I didn't feel adequately qualified to highlight them. I was expecting someone with more expertise would jump in and simply do a better job than I ever could.
If I interpreted the tone of your question incorrectly, please understand that it wasn't an intentional sleight on you, and simply an artefact of a) plain text being an incredibly poor medium for communicating nuance; b) a defensive measure that I have had the displeasure of dealing with in the past. And if you were genuinely curious, that's wonderful, and I'm sorry that I didn't offer you more grace in my response.
You're probably getting replies like that because it's a bit of an odd question. Academic research isn't really done to achieve a particular purpose or goal. The piratical benefit literally is academic.
It's also one of the first questions from people that very much are criticizing, so even if it was an sincere question it will be lumped together. Not recognizing/addressing this when posing the question does nothing to prevent it from the lumping.
IIRC I had issues with inotify when I was editing files on a remote machine via SSHFS, when these files were being used inside a Docker container. inotify inside the container did not trigger the notifications, whereas it did, when editing a file with an editor directly on that host.
I think this was related to FUSE, that Docker just didn't get notified.
FUSE Passthrough is only useful for filesystems that wrap an existing filesystem, such as union mounts. Otherwise, you don't have an open file to hand over.
yeah but still not great for metadata operations, no?
i remember it was really not great for large sets of search paths because it defeated the kernel's built-in metadata caches with excessive context switching?
Somewhat off topic, but CERN has a fantastic science museum attached to it that I had the privilege of visiting last summer. There is of course Tim Berners-Lee's NeXT workstation, but also so much more. It is also the only science museum I've visited that addresses topics in cyberinfrastructure such as serving out massive amounts of data. (I personally get interested when I see old StorageTek tapes lol.) The more traditional science displays are also great. Check it out if you are ever in the Geneva area. It is an easy bus ride to get out there.
Don’t forget to visit the gift shop too.
They don’t have an online store so it’s the only place to get CERN ‘gears’.
You can easily overspend there for gifts your friends and family will appreciate (if they know and like about CERN and its missions).
What's funny is that I just visited the museum a few months ago, and am coincidentally wearing a CERN hat I got there while reading the post and comments. I also highly recommend checking out the museum!
There are also free tours basically every day, without pre-booking. The itineraries vary, but usually one of the old accelerators (synchrocyclotron) and the ATLAS visitor centre are shown.
Dumb question: is it right to think that the the experiments' results are reproducible? If so, what's the value in keeping results from distant past, given the data is generated at this enormous rate?
Well generally yes, but that isn’t how it works there.
Since the things they want to measure in their experiments are so atomically small, sensor noise becomes a huge problem.
So it’s not enough to find the sensor readings for NewMysteriousParticleX to be sure that it actually exists, it could just have been noise.
So you have to run the experiment again and again until your datapoint is statistically significant enough that you are sure, it wasn’t just noise.
A couple of years ago there was this case where they almost found a new particle, the significance was pretty close to the threshold - the problem was that this particle was not expected and would have shaken the foundations of particle physics. Some weeks later the particle has vanished back into the abyss of noise.
But yes at LHCb we discard about 97% of the data generated during collisions.
Disclaimer: I work on LHCb trigger system
33TB per user is a lot, but is it really "wild"? I can fill up well over 33TB of pro-camera photos in less than a year if I shoot every day. I'm sure scientists can generate quite a bit more data if they're doing big things like CERN does.
Rucio enables centralized management of large volumes of data backed by many heterogeneous storage backends.
Data is physically distributed over a large number of storage servers, potentially each relying on different storage technologies (SSD/Disk/Tape/Object storage) and, frequently, managed by different teams of system administrators.
Rucio builds on top of this heterogeneous infrastructure and provides an interface which allows users to interact with the storage backends in a unified way. The smallest operational unit in Rucio is a file. Rucio enables users to upload, download, and declaratively manage groups of such files.
https://rucio.cern.ch/documentation/started/what_is_rucio
It's kinda interesting to watch anything by Alberto Pace, the head of storage at CERN to get an understanding of the challenges and constraints: https://www.youtube.com/watch?v=ym2am-FumXQ
I was basically on the helpdesk for the system for a few years so had to spend a fair amount of time helping people replicate data from one place to another, or from tape onto disk.
> over the years what discoveries have been made at CERN that have had practical social and economic benefits to humanity as a whole?
Some responders to the question believe I was criticizing a supposed wastefulness of the research. Not knowing the benefits of the discoveries in high energy physics, ie the stuff the accelerators are actually built to discover, doesn't mean I was criticizing it.
Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
So to rephrase the question - What are the practical social and economic benefits to society that the discoveries in high-energy particle physics at institutions like CERN have made over the years?
This is not just in relation to CERN, but world wide, such those experiments which create pools of water deep underground to study cosmic rays etc.
One of the biggest and more 'direct' social and economic benefits (in the sense of being directly associated with high-energy particle physics) would be the development of synchrotron light sources, which are a lot more direct in their contribution to society. In typical particle accelerators, the emission of synchrotron light is a negative effect, but it turns out to be pretty valuable for material science. These facilities are usually so busy that they have to pick and choose the study proposals they accept. As an example, some of the early understanding of Covid-19's structure came from studies at synchrotrons. More recently there are startups which are attempting to use synchrotron tech to sterilize things.
Besides that it's all mainly indirect effects. A lot of the cost of building and updating these sorts of machines is towards developing improved sensors, cryonics, magnets, optics, control systems, networking systems etc. These all feed into other fields and other emerging technologies.
I was one of those responders.
There were two very deliberate reasons I specifically avoided talking about particle physics:
1. I interpreted the tone of the original question to be extremely highly cynical of any scientific contribution particle physics has made, so I instead went for 'consequential' things. More excitement around education, outreach, and other adjacent aspects that are beneficial to humans. I did this to potentially avoid, "How has discovering a new Boson made my rent cheaper?" types of arguments that are only ever made in bad faith, but have been made to me a disheartening number of times in my career; and
2. I am scientist and I have collaborators and colleagues at CERN, but I'm not a particle physicist and so I didn't feel adequately qualified to highlight them. I was expecting someone with more expertise would jump in and simply do a better job than I ever could.
If I interpreted the tone of your question incorrectly, please understand that it wasn't an intentional sleight on you, and simply an artefact of a) plain text being an incredibly poor medium for communicating nuance; b) a defensive measure that I have had the displeasure of dealing with in the past. And if you were genuinely curious, that's wonderful, and I'm sorry that I didn't offer you more grace in my response.
I think this was related to FUSE, that Docker just didn't get notified.
FUSE Passthrough landed in kernel 6.9, which also reduces context switching in some cases: https://www.phoronix.com/news/Linux-6.9-FUSE-Passthrough . The benchmarks in this article are pretty damning for regular FUSE.
i remember it was really not great for large sets of search paths because it defeated the kernel's built-in metadata caches with excessive context switching?
Deleted Comment