Interestingly, a lot of the “pain points” highlighted here between in-person and zoom-based communication align with my experiences as a “neurodivergent” person, some examples:
(1) low audio quality making it difficult to understand the speaker/easily distracted by background noise, (2) lack of access to non-verbal communication channels like body language requiring dedicated brainpower to understand the speaker, (3) dysmorphia brought on by hyper awareness of one’s appearance, (4) lack of adherence to societal norms in conversation
I feel like remote work has leveled the playing field in a way that requires everyone to be explicit in their communication.
A couple jobs ago (pre-pandemic) I was at a remote company that put some effort into 1 & 2 that I've become big proponents of.
Allocate $300 per person to get everyone a podcaster mic and a pair of open backed headphones to plug into it as a monitor so you hear your own voice through your headphones. It removes the feeling that you need to speak loudly to be heard, which is a lot of the fatigue.
And if both parties are using this setup, it greatly reduces the "only one person can speak at a time" feeling, allowing you to use more natural vocal interactions without feeling like you're interrupting.
Together it makes a huge difference. I have audio processing and sensory problems and used to be so drained after a 40 minute zoom meeting. With this setup I can spend 4 hours a day pair programming and be appropriately tired but not exhausted.
Your 3-4 and large meetings with executives screaming into their laptops from coworking spaces I have no solutions for however.
I optimized for this style setup about a year into the pandemic, but using a Beyerdynamic MX300 wired gaming headset. The headset is closed-back, so to get voice feedback ("sidetone") I use an inexpensive Behringer XENYX USB302 mixer. It allows for adjusting the mic/audio mix going to my headphones, and also is a USB audio interface.
This setup is as minimal as I was able to get it: no big booms on my desk to move around, microphones blocking my video, etc. Yeah, I look like a pilot on calls, but the audio quality is amazing. Also, the mic being close to me blocks background noise, and isolates it from my desk. Sometimes my colleagues with podcaster-style mics have issues with mechanical transmission from their desk setups, while I have none.
People often comment — even to this day — about my audio quality. I talk to people all day, being in sales, and it makes a huge difference in my presence and professionalism. Absolutely worth every penny I spent.
As a side note, I also use my iPhone as my webcam (continuity cam I think it's called) along with a couple Logitech lights on my monitor, and the overall quality of my digital presence often blows people away.
> And if both parties are using this setup, it greatly reduces the "only one person can speak at a time" feeling, allowing you to use more natural vocal interactions without feeling like you're interrupting.
How does it work? Asking, because I always have this issue in 3+ person calls, where I'm frequently starting to talk just about when someone else does, ending up in those awkward "oh sorry you go please" moments. That makes me prefer to not say anything at all unless explicitly talked to.
I always assumed that my issue is that I'm just a slow-thinking dumbass who can't get the cues in time (which is true, because it happens in face-to-face in-person conversations as well, just less frequently), but maybe it's a technical issue contributing to it?
I personally use a single over the ear [headphone with directional] mic because I want to hear everything around me normally (and myself). They are super cheap, and for jitsi/zoom/teams I don't normally need directional sound. It's also easier on my ears than earbuds. Esp if someone is too loud. I just tip it back a bit.
I’ve done a deep dive into this and have a setup I’m fairly happy with. It doesn’t sound as good as a “podcast” mic, but it’s also not obnoxious and in your face. While sounding worlds better than apple or gaming headsets. I detail in this thread:
It’s a wired headset with a condenser mic I feed into a daw with real-time compressor and eq. No software required or processing done on the computer. The only thing I’m missing is a wireless headset with audio in and out that can be routed to the DAW. Wireless headsets connect to the computer via usb which is a shame. All of my friends use steelseries and it sucks because I constantly have to ride their volume levels because the boom arm moves out the ideal position and they don’t have a leveler to do deal with it.
I know it's not exactly the point, but would you mind sharing what specific mic and headphones you went with for this? It can help to get a sense of what to look for to create a similar setup.
This is tricky to do correctly if you don't want to look like one of three things:
1. Podcast bro / Twitch streamer (giant mic inches from face)
2. Gamer (headphones with or without a headset mic)
3. Call center employee (open back mono headset with mic)
Turning off video is obviously the fastest fix to this. But assuming you don't want to do that...
You'll find a lot of people hate wearing headphones since it messes up their hairstyle or just looks distracting or some other sensory issue with having something clamped to your head. Earbuds work better for them; as long as the mic can't hear other speakers, the annoyance of half-duplex audio is eliminated.
The mic problem is harder if you don't want something obvious in frame. Lav mics work if you know how/where to attach one and to minimize clothing movement noise. Other options will require some level of room treatment if the mic isn't close to the speaker's mouth.
>Allocate $300 per person to get everyone a podcaster mic and a pair of open backed headphones to plug into it as a monitor
How's your experience with this regarding soundproofing the room. Is normal noise filtering fine? I assume you have the mic on a boom with a pop filter.
Re #1, many video conferencing apps have in-app captions and I really appreciate turning these on. Sometimes I can't understand the speaker and captions save me. Sometimes I space out and captions allow me to "rewind" conversations so I can answer an unexpected question.
Even if your video conferencing apps doesn't have captions, your OS might (Mac does, for example). Not as good as one in the conferencing software (which can label speakers), but it's better than nothing.
This feature also helps if you join a meeting and don't have your audio set up right.
In my experience, these captions only work for native speakers (of a language that happens to be supported) with "standard" accents. With other accents and/or non-native speakers, they break down pretty badly. Combine that with some amount of bilingual dialog and some technical terms and internal tech names, and it can get quite mystifying.
Interesting to me are the inverse issues my hard of hearing wife has with in-person work. She rarely gets anything out of meetings because the relative chaos and people speaking over each other means she can't hear any of it. No technology yet provides real-time transcription for reality. The politics of interpersonal relations makes everyone think you're stuck up, shy, unconfident, or all of these if you don't often speak up yourself. The common practice of looking over someone's shoulder for co-working doesn't work well when the other person relies upon reading your lips to understand you. All of the mythical serendipitous chance interactions that happen around a water cooler don't involve you if you can't hear anyone and don't take part in water cooler chat.
> No technology yet provides real-time transcription for reality.
A deaf dancer friend uses an app, I think from Google. We’re often in discussion/teaching circles in dance classes. Seems to work well.
“Real time” is relative though. Conversation latency matters in roughly the tens of milliseconds and it’s certainly not near that fast.
The group has to be attuned to having a turn for her to speak. She’ll pretty often come in at the same time as someone else, missing those subtle sounds of someone else about to talk. The emergent rule is “tie goes to her”.
This all works well in the mindful scene of Berkeley experimental dance workshops. Maybe she’d get steamrolled in a competitive finance office or something.
I miss remote work when we didn't have to go on video all the time. Seeing myself/other people in a video chat is exhausting, and the non-verbal cues are often delayed so they lose meaning entirely. Just give me low latency voice again, please.
Our team culture defaults to video off for most recurring calls, it's more relaxing and less exhausting. When there's something really important to discuss, or an exec/client meeting we'll turn on video.
At my previous job during pandemic we usually communicated audio only. At my current job we most of the time use video as well. It is a big big improvement. Just a few days ago we had to have an audio-only call with a supplier, who said such was their policy. It was super annoying not being able to see them. It is much harder to get cues about for example how certain they are when saying something.
First thing to do in a conference is disabling yourselfs stream. It doesnt help in any way after the first 10 seconds. I cant understand why any manufactorer added those, except to fulfill narcisstic tendencies.
Video calls are exhausting because without eye contact you can never tell who's looking at you at any given moment.
You have to maintain perfect composure because there's a camera pointed right at your face transmitting your minutest reactions to the rest of the group.
It's especially difficult when you're in a meeting that just won't end, or somebody keeps going back to a tiresome subject that's already been beaten to death. You can't even allow yourself a look down or a raised eyebrow without rudely signaling your boredom to everyone else on the call.
Oh boy do I have good news for you - nobody is looking at you if you're not currently talking, and nobody is going to notice or care if you look down or have the "minutest reaction".
People are primarily looking at the speaker, secondarily at themselves, third is probably checking slack/email/whatever and only in a deep 4th or 5th place is anyone paying any attention to the expression on other people's faces.
Realizing that nobody is paying that much attention and doesn't care hopefully can make the calls less stressful and exhausting for you.
Exactly, whenever I'm on a video meeting I'm just flinging myself around on my chair. I literally can't hold myself still. But it's even easier for me than an in person meeting since I get automatic captions, which make it so easy to keep track of the conversation.
Move your camera. I have mine above and to the right, so it is always on an oblique angle to my face. For me it relieves the tension of the close-up on your face as well as relieving the worry about looking at the speaker as the camera position wouldn't translate that anyways.
> You can't even allow yourself a look down or a raised eyebrow without rudely signaling your boredom to everyone else on the call.
You realize that by hiding your boredom from everyone on the call, you are signaling to them that the discussion is interesting and thus you encourage them to continue that discussion.
Not to mention that your meeting is recorded, so any misstep will be seen during the replay. When I have to go back watch an old meeting for details I cringe at some of my facial expressions.
> You can't even allow yourself a look down or a raised eyebrow without rudely signaling your boredom to everyone else on the call.
How is that any different than an in person meeting?
There is this assumed necessity that a video call is so much better for these meetings. I never enable the camera during meetings. Some is sharing their screen, and we simply discuss it. As a sibling comment points out, nobody looks at the thumbnails of the nonspeaking person. If the presenter is trying to look at the thumbnails of the viewers then they are probably not presenting very well. They are just distractions.
Maybe this is a generational thing, as I also never use FaceTime or whatever other video calling, and I'm a gray beard without the beard. Typically, the only ones I see doing that are those that are younger.
> How is that any different than an in person meeting?
Because you can see yourself. Whereas people IRL conduct themselves in ways that are unoptimized - because they can't see themselves, and since everybody is unoptimized it doesn't really matter.
On a video call, everybody is watching how they look, if you're the one who isn't paying attention then you're the odd one out.
Moreover the geometry of a group video call can be very unphysical if everybody can see everybody else including themselves.
This is already a thing among some minoritised groups.
Specifically they feel a need to project a high standard of presentation at work, which is labour-intensive to accomplish, and - quite rightly - they don't want to feel forced to do the same thing to their home environment, or have relative strangers able to see into their homes.
Hide your own video from the meeting view and you're half way to some sort of Zoom sanity. No human can focus properly with a laggy mirror of themselves in their face.
> Video conferencing platforms have opted to deliver audio that arrives quickly but is low in quality. Platforms aim for a lag time of less than 150 milliseconds. Yet that is long enough to violate the no-overlap/no-gap convention to which speakers are accustomed.
EXACTLY. Latency is a killer. Even on cell phones.
Latency does not get enough attention. It's not as in-your-face as dropouts, but the levels of latency which we experience in modern communication channels are crazy bad.
And even if platforms are aiming for 150 ms, in practice the latency is often much higher than that.
Furthermore, a lot of users have substantial latency introduced by their setups - for example, earlier Bluetooth headphones like AirPods 1 add 300ms of latency (126ms for AirPods 2 Pro) https://stephencoyle.net/airpods-pro-2 . So even if you have a great network backbone and a great protocol, someone's crappy home setup might make it all for naught.
I've long held the opinion that many audio compression codecs have outlived their usefulness, and now simply introduce latency and reduce audio fidelity to preserve bandwidth that is no longer in short supply.
Every internet connection I've had over the past 10 years in the US has been fast enough to send and receive at least a dozen uncompressed CD-quality PCM streams simultaneously. On paper, my current fiber connection should be able to handle 1,400 of them. This insistence on audibly mangling the audio to keep it at ~96 kbps makes less and less sense every day -- especially considering how the audio is usually carrying all the important information in the call anyway!
(I find the same thing happens on streaming services too. 98% of the bits are for video, 2% for sound.)
While less compression is necessary than before, I'd argue that some basic, ultra-fast compression would be useful to allow for some basic bandwidth savings and error correction codes.
Honestly, my bluetooth headset probably adds almost as much latency as my fiber connection and wifi combined. That's another issue most people don't even consider: bluetooth is orders of magnitude slower than a wire, even in the best case scenarios.
You've also got to consider that the higher the bandwidth a stream is, the harder it is to deliver it with a low latency.
If every packet has a risk of being dropped or of arriving out of order then increasing either the size or the sheer quantity of those packets will also increase how often they get mishandled or buffered.
Plus you run into compatibility problems between different kinds of devices such as mobile users trying to connect over spotty 3G connections.
it's more likely that these companies would rather save a few pennies by introducing a new 64 kbps codec that reduces quality by only 10%, while offering the old codec's bitrate as a “fidelity mode” for premium+ subscribers.
Yes latency in practice is often higher than a second. Even in-person I must be careful not to barge in conversations and cutting off people, but latency makes it much worse. If you are in a larger group, one tends to not say anything to the group spontaneously because it always feels like cutting in.
I’ve been working remotely for nearly 20 years and will never return to a cubicle.
The secret to making online meetings bearable is simple: use audio only - just like talking on the phone, which people can do happily for hours. I don’t need to see you, and you don’t need to see me, but we can always share a screen if necessary.
Some people _really_ need in person interaction, the problem is their requested accommodation is that the rest of us should have to be in that environment as well. You go into the office then, be there with people who want to be there, don't keep trying to force the rest of us to be in that environment.
What are really actually exhausting are meetings that shouldn't have happened, and people trying to force everybody to be on camera. My solution to unnecessary meetings I'm not willing to complain about is doing the dishes, folding the laundry, or otherwise paying only the required attention while otherwise occupying myself.
At least in my experience doing meetings by video allows people to schedule back-to-back-to-back which gets tiring whether in person or not. In the days of having a walk or a drive between all meetings I think that also served as a good pause to reset. Limiting meeting load and doing a very brief walk or some garden chore between is just as good even if they're all video.
> Recall that compared to electronic devices, the human brain operates at ridiculously slow speeds of about 120 bits (approximately 15 bytes) per second. Listening to one person takes about 60 bits per second of brainpower, or half our available bandwidth.
How are they getting these numbers? Is it that a word is typically a few bytes worth of characters and we can process a couple words per second? Or are they really saying that all the information we perceive during a conversation can be reduced to less than 15 bytes per second? The former seems like a flawed comparison, and the latter seems ludicrous.
I was able to find this MIT technology review[1] that explains that one measure of a specific lexical task gives a processing rate of 60 bits per second. However, the author of the scientific paper in question complains about this summary:
> "I have a small scientific comment on your post. Although I think it represents my results very well, I find the opening sentence: “A new way to analyze human reaction times shows that the brain processes data no faster than 60 bits per second.” a bit misleading. I don’t think I have shown anything about the upper bounds of the processing speed, in principle the curve I show in Figure 4 of the
manuscript could extend far beyond this, but I have no information to make this extrapolation, so I would not claim (for the moment) any upper limit."
Britannica[2] has an explanation of historical estimates, which are in the same ballpark: "For example, a typical reading rate of 300 words per minute works out to about 5 words per second. Assuming an average of 5 characters per word and roughly 2 bits per character yields the aforementioned rate of 50 bits per second." However, 2 bits per character is a 4-letter alphabet, so already they have to be talking about some information theory version where the information density in an English word is much lower than what individual letters can encode (which makes sense, the bigram qz has zero occurences while th is frequent).
It goes on to explain that "in other words, the human body sends 11 million bits per second to the brain for processing, yet the conscious mind seems to be able to process only 50 bits per second" except that his is ludicrous on its face, at least by some measures, as the bps from the eyes alone in the table just below this paragraph is 10 million bps. Clearly, the "lexical task" of reading words involves processing much of that visual input -- even 0.1% would still be 10 kbps -- which is handwaved away in the lexical stream example to pretend that the brain receives a direct serial input stream of 2-bit characters.
Furthermore, it's easy to find other estimates of information processing that suggests the input from a single eye is more like 1.6 gigabits per second[3], which is 320x higher than the 10 megabit total given by britannica. The article explains that there's already compression before it hits the brain, though, as the optical nerve is limited to around 100 megabits per second.
The 120 bit upper limit seems to be an invention of psychologist Mihály Csíkszentmihályi (and purportedly independently by Bell Labs engineer Robert Lucky, though I can find no primary source that supports that claim), and is mentioned in that context in the wikipedia article for Flow[4].
(1) low audio quality making it difficult to understand the speaker/easily distracted by background noise, (2) lack of access to non-verbal communication channels like body language requiring dedicated brainpower to understand the speaker, (3) dysmorphia brought on by hyper awareness of one’s appearance, (4) lack of adherence to societal norms in conversation
I feel like remote work has leveled the playing field in a way that requires everyone to be explicit in their communication.
Allocate $300 per person to get everyone a podcaster mic and a pair of open backed headphones to plug into it as a monitor so you hear your own voice through your headphones. It removes the feeling that you need to speak loudly to be heard, which is a lot of the fatigue.
And if both parties are using this setup, it greatly reduces the "only one person can speak at a time" feeling, allowing you to use more natural vocal interactions without feeling like you're interrupting.
Together it makes a huge difference. I have audio processing and sensory problems and used to be so drained after a 40 minute zoom meeting. With this setup I can spend 4 hours a day pair programming and be appropriately tired but not exhausted.
Your 3-4 and large meetings with executives screaming into their laptops from coworking spaces I have no solutions for however.
This setup is as minimal as I was able to get it: no big booms on my desk to move around, microphones blocking my video, etc. Yeah, I look like a pilot on calls, but the audio quality is amazing. Also, the mic being close to me blocks background noise, and isolates it from my desk. Sometimes my colleagues with podcaster-style mics have issues with mechanical transmission from their desk setups, while I have none.
People often comment — even to this day — about my audio quality. I talk to people all day, being in sales, and it makes a huge difference in my presence and professionalism. Absolutely worth every penny I spent.
As a side note, I also use my iPhone as my webcam (continuity cam I think it's called) along with a couple Logitech lights on my monitor, and the overall quality of my digital presence often blows people away.
How does it work? Asking, because I always have this issue in 3+ person calls, where I'm frequently starting to talk just about when someone else does, ending up in those awkward "oh sorry you go please" moments. That makes me prefer to not say anything at all unless explicitly talked to.
I always assumed that my issue is that I'm just a slow-thinking dumbass who can't get the cues in time (which is true, because it happens in face-to-face in-person conversations as well, just less frequently), but maybe it's a technical issue contributing to it?
*inserted [] for clarity
https://www.audiosciencereview.com/forum/index.php?threads/w...
It’s a wired headset with a condenser mic I feed into a daw with real-time compressor and eq. No software required or processing done on the computer. The only thing I’m missing is a wireless headset with audio in and out that can be routed to the DAW. Wireless headsets connect to the computer via usb which is a shame. All of my friends use steelseries and it sucks because I constantly have to ride their volume levels because the boom arm moves out the ideal position and they don’t have a leveler to do deal with it.
1. Podcast bro / Twitch streamer (giant mic inches from face)
2. Gamer (headphones with or without a headset mic)
3. Call center employee (open back mono headset with mic)
Turning off video is obviously the fastest fix to this. But assuming you don't want to do that...
You'll find a lot of people hate wearing headphones since it messes up their hairstyle or just looks distracting or some other sensory issue with having something clamped to your head. Earbuds work better for them; as long as the mic can't hear other speakers, the annoyance of half-duplex audio is eliminated.
The mic problem is harder if you don't want something obvious in frame. Lav mics work if you know how/where to attach one and to minimize clothing movement noise. Other options will require some level of room treatment if the mic isn't close to the speaker's mouth.
How's your experience with this regarding soundproofing the room. Is normal noise filtering fine? I assume you have the mic on a boom with a pop filter.
Even if your video conferencing apps doesn't have captions, your OS might (Mac does, for example). Not as good as one in the conferencing software (which can label speakers), but it's better than nothing.
This feature also helps if you join a meeting and don't have your audio set up right.
A deaf dancer friend uses an app, I think from Google. We’re often in discussion/teaching circles in dance classes. Seems to work well.
“Real time” is relative though. Conversation latency matters in roughly the tens of milliseconds and it’s certainly not near that fast.
The group has to be attuned to having a turn for her to speak. She’ll pretty often come in at the same time as someone else, missing those subtle sounds of someone else about to talk. The emergent rule is “tie goes to her”.
This all works well in the mindful scene of Berkeley experimental dance workshops. Maybe she’d get steamrolled in a competitive finance office or something.
Deleted Comment
Deleted Comment
Dead Comment
You have to maintain perfect composure because there's a camera pointed right at your face transmitting your minutest reactions to the rest of the group.
It's especially difficult when you're in a meeting that just won't end, or somebody keeps going back to a tiresome subject that's already been beaten to death. You can't even allow yourself a look down or a raised eyebrow without rudely signaling your boredom to everyone else on the call.
People are primarily looking at the speaker, secondarily at themselves, third is probably checking slack/email/whatever and only in a deep 4th or 5th place is anyone paying any attention to the expression on other people's faces.
Realizing that nobody is paying that much attention and doesn't care hopefully can make the calls less stressful and exhausting for you.
1. Primarily at oneself 2. Anything else on the computer 3. Speaker
Oh, one must. It is an important signalling mechanism. How else will people know that the horse they are beating is dead?
yeah, Eye Contact Correction is a thing now
You realize that by hiding your boredom from everyone on the call, you are signaling to them that the discussion is interesting and thus you encourage them to continue that discussion.
How is that any different than an in person meeting?
There is this assumed necessity that a video call is so much better for these meetings. I never enable the camera during meetings. Some is sharing their screen, and we simply discuss it. As a sibling comment points out, nobody looks at the thumbnails of the nonspeaking person. If the presenter is trying to look at the thumbnails of the viewers then they are probably not presenting very well. They are just distractions.
Maybe this is a generational thing, as I also never use FaceTime or whatever other video calling, and I'm a gray beard without the beard. Typically, the only ones I see doing that are those that are younger.
Because you can see yourself. Whereas people IRL conduct themselves in ways that are unoptimized - because they can't see themselves, and since everybody is unoptimized it doesn't really matter.
On a video call, everybody is watching how they look, if you're the one who isn't paying attention then you're the odd one out.
Moreover the geometry of a group video call can be very unphysical if everybody can see everybody else including themselves.
Specifically they feel a need to project a high standard of presentation at work, which is labour-intensive to accomplish, and - quite rightly - they don't want to feel forced to do the same thing to their home environment, or have relative strangers able to see into their homes.
Cameras on certainly shouldn't be mandatory.
EXACTLY. Latency is a killer. Even on cell phones.
Latency does not get enough attention. It's not as in-your-face as dropouts, but the levels of latency which we experience in modern communication channels are crazy bad.
And even if platforms are aiming for 150 ms, in practice the latency is often much higher than that.
Every internet connection I've had over the past 10 years in the US has been fast enough to send and receive at least a dozen uncompressed CD-quality PCM streams simultaneously. On paper, my current fiber connection should be able to handle 1,400 of them. This insistence on audibly mangling the audio to keep it at ~96 kbps makes less and less sense every day -- especially considering how the audio is usually carrying all the important information in the call anyway!
(I find the same thing happens on streaming services too. 98% of the bits are for video, 2% for sound.)
Honestly, my bluetooth headset probably adds almost as much latency as my fiber connection and wifi combined. That's another issue most people don't even consider: bluetooth is orders of magnitude slower than a wire, even in the best case scenarios.
If every packet has a risk of being dropped or of arriving out of order then increasing either the size or the sheer quantity of those packets will also increase how often they get mishandled or buffered.
Plus you run into compatibility problems between different kinds of devices such as mobile users trying to connect over spotty 3G connections.
The secret to making online meetings bearable is simple: use audio only - just like talking on the phone, which people can do happily for hours. I don’t need to see you, and you don’t need to see me, but we can always share a screen if necessary.
What are really actually exhausting are meetings that shouldn't have happened, and people trying to force everybody to be on camera. My solution to unnecessary meetings I'm not willing to complain about is doing the dishes, folding the laundry, or otherwise paying only the required attention while otherwise occupying myself.
How are they getting these numbers? Is it that a word is typically a few bytes worth of characters and we can process a couple words per second? Or are they really saying that all the information we perceive during a conversation can be reduced to less than 15 bytes per second? The former seems like a flawed comparison, and the latter seems ludicrous.
> "I have a small scientific comment on your post. Although I think it represents my results very well, I find the opening sentence: “A new way to analyze human reaction times shows that the brain processes data no faster than 60 bits per second.” a bit misleading. I don’t think I have shown anything about the upper bounds of the processing speed, in principle the curve I show in Figure 4 of the manuscript could extend far beyond this, but I have no information to make this extrapolation, so I would not claim (for the moment) any upper limit."
Britannica[2] has an explanation of historical estimates, which are in the same ballpark: "For example, a typical reading rate of 300 words per minute works out to about 5 words per second. Assuming an average of 5 characters per word and roughly 2 bits per character yields the aforementioned rate of 50 bits per second." However, 2 bits per character is a 4-letter alphabet, so already they have to be talking about some information theory version where the information density in an English word is much lower than what individual letters can encode (which makes sense, the bigram qz has zero occurences while th is frequent).
It goes on to explain that "in other words, the human body sends 11 million bits per second to the brain for processing, yet the conscious mind seems to be able to process only 50 bits per second" except that his is ludicrous on its face, at least by some measures, as the bps from the eyes alone in the table just below this paragraph is 10 million bps. Clearly, the "lexical task" of reading words involves processing much of that visual input -- even 0.1% would still be 10 kbps -- which is handwaved away in the lexical stream example to pretend that the brain receives a direct serial input stream of 2-bit characters.
Furthermore, it's easy to find other estimates of information processing that suggests the input from a single eye is more like 1.6 gigabits per second[3], which is 320x higher than the 10 megabit total given by britannica. The article explains that there's already compression before it hits the brain, though, as the optical nerve is limited to around 100 megabits per second.
The 120 bit upper limit seems to be an invention of psychologist Mihály Csíkszentmihályi (and purportedly independently by Bell Labs engineer Robert Lucky, though I can find no primary source that supports that claim), and is mentioned in that context in the wikipedia article for Flow[4].
1: https://www.technologyreview.com/2009/08/25/210267/new-measu...
2: https://www.britannica.com/science/information-theory/Physio...
3: https://www.discovermagazine.com/mind/the-information-enteri...
4: https://en.wikipedia.org/wiki/Flow_(psychology)#Mechanism
Deleted Comment