But, he says, instead of being limited by how quickly we can process information by listening, we’re likely limited by how quickly we can gather our thoughts. That’s because, he says, the average person can listen to audio recordings sped up to about 120%—and still have no problems with comprehension.
Some years ago I worked on an accessibility project for an app and website designed for people with disabilities. One of the team members had low vision, and used a screen reader that must have been set to 3x or even higher. I usually listen to YouTube and podcasts at 1.5-2x and I could barely understand the audio. He seemed surprised, which indicated to me that 3x+ was the norm for people in his circle.
I wonder if his ability was trained through years of using fast screen readers, vs. a lower visual processing load leads to better audio processing, or some other explanation.
I'm the blind dev who refactored a huge chunk of the Rust compiler [0]. I'm at roughly 800 words a minute with a synth, with the proven ability to top out at 1219. 800 or so is the norm among programmers. In order to get it we normally end up using older synths which sound way less natural because modern synthesis techniques can't go that fast. There's a trade-off between natural sounding and 500+ words a minute, and the market now strongly prefers the former because hardware can now support i.e. concatenative synthesis.
1219 is a record as far as I know. We measured it explicitly by getting the screen reader to read a passage and dividing. I spent months working up from 800 to do it and lost the skill once I stopped (there was a marked level of decreased comprehension post 1000, but I was able to program there; still, in the end, not worth it). When I try to described the required mental state it comes out very much like I'm on drugs. Most of us who reach 800 or so stay there, though not always that fast for i.e. pleasure reading (I do novels at about 400). it's built up slowly over time, either more or less explicitly. I did it because I was in high school doing muds and got tired of not being able to keep up; it took about 6-8 months of committing to turn the synth faster once a week no matter what, keeping it there and dealing with a day or two of mild headaches. Note that for most blind people these days, total synthesis time per day is around 10+ hours; this stuff replaces the pencil, the novel, etc. Others just seem to naturally do it. You have little choice, it's effectively a 1 dimensional interface, so from time to time you find a reason to bump the knob. And that's enough.
Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.
And as a followup to that--because really this is the weird part--some circles of blind people (including mine) talk faster between ourselves. That's not common, but it happens. I still sometimes have to remember that other people can't digest technical content at the rate I say it and remember to slow down. A good way to bring it out is to have me try to explain a technical concept that I understand really well. I have the common problem in that situation of not being able to talk as fast as I think, but I also seem to have the ability to assemble words faster in a sort of tokenize/send to vocal cords sense once I know what I want to say.
To me, the fact that this does in fact seem to be bidirectional at least some is more interesting than that I can listen fast.
Has anyone tried overlapping words instead of speeding them up? Like so:
How
are
you
doing?
I often wondered if this, or at least sped up speech, should be the default robotic interface... it would make sense to optimize for efficiency/speed (while maintaining legibility) if we can do so.
> Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.
I find that the maximum understandable rate varies a lot between speakers. For some speakers 2.5x is possible, but just 1.5x for others.
One advantage synths has, is that they can more easily control the speed at which words are spoken, and the pauses between words independently. When watching/listening pre-recorded content I often find that I'd want to speed up the pauses more than the words (because speeding up everything until the pauses are sufficiently short make the words intelligible).
If someone knows of a program or algorithm that can play back audio/video using different rates for speech and silence, please share.
Are old speech synths not harsh on the ears to listen for longer periods? Or maybe I'm just familiar with the super robotic ones (I like them for music production).
If so, have you considered using an EQ plugin to maybe turn down the harsher high frequencies a few notches? Just a thought.
I've known a lot of people that push podcasts, videos, and audiobooks to extreme speed. I knew a guy who'd turn video speed up to 8x so he could binge watch a season of generic anime in an hour flat. I knew a girl who'd get through paperback romance novels by scanning each page diagonally, in 10 seconds each. And here in this thread we have a lot of people bragging along the same lines.
I just don't get the point. If you can process content much faster than it was meant to be played, it doesn't mean you're learning much faster than you could, it means the novel information density is low. Any content that can be sped up that much without loss is not worth listening to in the first place. You're just skipping the trite cliches, filler, and obvious facts.
I can read fast, and I typically go through fluffy NYT bestseller nonfiction at 600 WPM. But when I do this I constantly have a sneaking suspicion that I'm just wasting my time. When I read a good book full of new ideas, I barely go at 150 WPM, but the time always feels well-spent.
Exceedingly slow narration, particularly what's normal for audiobooks, is annoying to me because it's slower than I process words. It's like walking with someone whose pace is far slower than your natural gait -- it takes more energy and concentration to slow down. It's why slow-talkers are so annoying.
This isn't "how fast can I go through this" but "what is a comfortable pace"?
So I bump the speed up, though usually fairly modestly: 1.25x - 1.5x is generally enough.
I've noticed that preferred speeds vary tremendously with the quality of the work and speaker -- high-density information and an exceedingly good speaker, and I'll slow down. Slapdash redundant content and poor speaker, I'll speed up.
The degree of polish in the production matters tremendously. I've listened to CPG Grey's YouTube videos (highly polished) and podcasts (a lot of chit-chat with his co-host). The videos work well at normal speed, or perhaps slighly sped up. The podcasts I find nearly unlistenable, though they improve at much higher speeds (1.75x - 2x).
After doing speed reading exercises a few years ago I initially also started to use the techniques for novels, but that was a bad mistake in my opinion.
While I could get my comprehension percentage quite high with a bit of training, I lost all connection to the characters and story, stopped imagining the scenes and felt like the reading the book was a waste of time.
Novels should be read at a natural pace to give room to your imagination and dive into the story. You can still quickly scan over boring/repetitive filler text, but I did that without caring about WPM already.
With other things like textbooks / articles / reports cranking up your WPM and applying your attention more selectively by focusing on or re-reading critical parts is a very helpful skill though.
> . If you can process content much faster than it was meant to be played, it doesn't mean you're learning much faster than you could, it means the novel information density is low.
I 'read' your comment using TTS at 3x. What does that say about the information density of your comment?
(Little to nothing. TTS at that speed is still marginally slower that I normally read with my eyes. Human speech is generally much slower than is necessary to be understood.)
I imagine it's a compromise between cutting through the fluff and using the primary source material.
You could just read the plot synopsis or watch the highlights, but sometimes those don't convey build-up, suspense, or other data that are hard to losslessly compress.
Being comfortable with the "boilerplate" of a given medium or genre usually lets you skim or skip it to jump right into the good stuff.
I listen to a lot of podcasts and audiobooks while doing other things; walking, cleaning, cooking, traveling, playing games, etc. Every time I try speeding up, even just to 1.25x, I don't enjoy it as much, as it feels rushed and stressful. I think it could be interesting to learn to listen and read at extremely high speeds, but nothing more than interesting, and I'm even doubting the usefulness of it.
> I can read fast, and I typically go through fluffy NYT bestseller nonfiction at 600 WPM. But when I do this I constantly have a sneaking suspicion that I'm just wasting my time. When I read a good book full of new ideas, I barely go at 150 WPM, but the time always feels well-spent.
In my experience the best books routinely stop me dead in my tracks. I just started Invisible Man and every paragraph is littered with really deep themes, I can normally finish a 500 page book in a few days (a majority of my reading is done during my commute) but this will definitely be a slow burn.
For me, when I'm reading something dense, my WPM fluctuates. It could be 300WPM in the easy areas, and down to 30WPM in the conceptually challenging areas.
I'm not blind, but I've tested some of my apps for VoiceOver and it's just utterly unusable with a "reasonable" speed. You have to pretty much set it to your reading speed for it to be useful, and that happens to be significantly faster than most people are comfortable speaking.
Yes. Took me a while but I can comfortably understand 2x speed and now 1x podcasts seem weird like they're talking super slow. I would imagine it's something you just train even more out of necessity.
I typically use TTS near 2.5x (I turn it up when I'm alert and down when I'm tired.) It's definitely a learned skill; a few years ago I started at 1x and struggled even with that.
Every couple of months, take a moment to reflect on your comprehension. Is it currently easy for you to understand the audio? If yes, then crank it up a little bit until it's noticeably more difficult. Repeat this process periodically over a year or so and before you know it, it'll be set pretty damn quick.
I can watch YouTube content at 3x without issue. I did this without much intentional effort - I simply downloaded an extension which allows me to speed up the video in increments of 0.1x using a keyboard shortcut. Whenever it felt slow I would speed it up, and whenever it felt too fast I would slow it down. Without paying much attention to the actual numbers I had reached over 3x within a month or so.
I tried this right now with the trick below. 2x was no issue but 3x... that was a big step. It sounded like word salad. As if my brain was decoding the words out of order and was unable to assemble them into sentences.
It depends on the speaker. Some speakers are particularly slow with lots of long pauses and others are faster.
I think anyone can get to 3x but it takes some time to adjust to faster and faster speeds. It also depends on what you are doing while listening. Distractions or listening while doing something else (driving for example) lowers my ability to comprehend. For example on the interstate without much traffic I'll listen to audiobooks at 3x, but in a city or a crowded highway I have to slow it down.
If you close your eyes listening comprehension goes way up. I top out around 2x if I have to look but with my eyes closed I can get full comprehension at 3x+
If it's a technical talk or something I'll still pause often too reflect on what was said, but I can hear full sentences just fine at >3x with my eyes closed.
Well I'm sure the ability requires training, but I wonder if it is specific to screen readers.
Consider what you quote:
> we’re likely limited by how quickly we can gather our thoughts
Now the amount of relevant info on a screen is typically small enough that a sighted person can zero in on it at a glance and perhaps just click a button without thinking.
I.e. the amount of info that deserves "gathering our thoughts" is typically very small. So if that is the bottle-neck, your colleague can keep cranking up the audio speed until low-level processing audio becomes the bottle-neck, which is a regime that sighted people never deal with even, not even the nerds who speed up their Joe Rogan podcasts.
It's easy to train yourself to do that though. Just find your favorite audiobook and listen to it daily. First listen to it at 1.5x, then adjust to 2x after a few days, then 2.5x after a few more days etc. You'd be surprised how fast your brain can actually process the information.
Personally when I did this I feel irritated when I speak because my sped-up audiobooks have conditioned me into thinking I should be speaking at that rate, but it's just not possible for my mouth and tongue to move that fast physically.
>But, he says, instead of being limited by how quickly we can process information by listening, we’re likely limited by how quickly we can gather our thoughts. That’s because, he says, the average person can listen to audio recordings sped up to about 120%—and still have no problems with comprehension.
The deduction that is quoted does not follow: speeding up audio recordings with 120% results pressing both the auditory system as the language and thought systems (or any other potential bottleneck) to be sped up proportionally since it's a pipeline.
Similarily the posted article (I have yet to read the original one) states in the title that "human speech" has a universal transmission rate, but the research tested reading not speech, so this may or may not be true.
Perhaps the bottleneck is human speech, with the side effect that listening is never trained beyond the typical speech rate limit. (in this case the higher speed syllable languages would be easier to pronounce fast, and the lower speed ones harder to pronounce fast)
Perhaps the bottleneck was in the visual burden of reading, a language that encodes more bits per syllable implies more types of syllables, which irrespective of size or number of characters puts a classification demand on the visual system (classifying a symbol coming from a set of only 2 symbols will be easier, but will require more classification instances than classifying from a large set of characters but with fewer classification instances).
Perhaps the bottleneck was again in speech during reading by subconscious vocalizing of the text.
Perhaps the bottleneck was in the auditory "speech to syllable" classification.
Perhaps the bottleneck was in parsing text.
Perhaps the bottleneck was in "accessing thoughts" etc.
So it is rather hard to identify where the bottleneck is located without having a means of detecting where in the brain the "incoming queue is full" vs "incoming queue is waiting" during speaking, listening, reading. And which of these 3 causes this universal bottleneck (since I gave 2 examples of how an apparent bottleneck in reading could stem from not being trained beyond a possible universal bottleneck in speaking rate...)
That quote seems to imply that they have not measured the maximum speed to receive information but the speed at which we are comfortable outputing it.
There is no shortage of people training to receive a lot of information at once, and 39 bits per second seems to me on the lower end of what some video games require but in terms of constructed, linguistic output? They may be on to something there.
Fast chatters are not faster thinkers. I have yet to see people exchanging thought at a higher rate then usual.
> I wonder if his ability was trained through years of using fast screen readers, vs. a lower visual processing load leads to better audio processing, or some other explanation.
While I'm sure his visual cortex picked up some slack, I'm willing to bet it's mostly just through training. We just aren't trained for faster communication. I've known blind people and they are the same way with their readers.
For me I imagine the bottleneck would rather be how quickly I can translate my thoughts into speech. Oftentimes I will start out talking to myself to explain a topic, only to eventually digress into my "mental monologue" because I start to process thoughts faster than I can say them.
I'm fully sighted but I use espeak tts at around 1000wpd for fluff text like ecnomist articles and 300wpm for heavy going text like the text sections of math books.
I also watch most video at 2/3 times the speed since the skills seem transferable.
I am blind, but I am not a primary speech synthesis user, I prefer tactile braille. However, I know a number of people which are using their speech synthesizers at rates similar to what you described above.
My theory/experience with this phenomenon is, that a speech synthesizer never makes any errors. When it pronounces a word, it will do so exactly the same way everytime the same word comes up. So the learning effect after a while is a bit higher then when you listen to a human. Humans will always have slight variation in how they pronounce the same word. So, as I understand it, you can "learn" to listen to your speech synthesizer on a fast rate more effectively then you would be able to listen to a fast human speaker.
And yes, I also listen to YouTube talks and audiboosk at about 1.5-2x rate. So I guess 80 bits per second is relatively easily doable for the receiver.
Depends on the natural speed of the speaker, but I listen to most podcasts and YouTube commentary/narrative at 2x. Podcasts sometimes in the 3x range.
Sometimes it's worth slowing down to 1.5x to give myself a bit of time to process the ideas, though slowing below that sometimes hurts comprehension.
Side note: I find that YouTube in Chrome has the best pitch-preserving time stretching filter, and I've neglected all this time to figure out what exactly they use to accomplish that. I'd love to add that to mpv, if it's not already there.
Probably this is a healthy reminder of how the brain optimizes and uses sections of itself. Without the need for vision, those cranial areas can be better used for other things.
Apologies for being harsh, but this kind of thing is the phrenology of our time. I know it's utterly conventional to think this way about language in some circles that present themselves as doing legitimate science, but the view that you can calculate the amount of information in human speech, except in a super-technical sense that doesn't match any of the reporting on this study or the way people are interpreting it, has to be called out for the total nonsense that it is. It doesn't bear a moment's honest reflection.
And yes, I know information theory. It's language that these folks - many of them prominent and celebrated within their utterly normalized professions, just like in the days of phrenology - are fundamentally mistaken about. What quantity of information do you think there is in the word "trump," for instance? Is it the same over time, to bring up just one feature of how this funny thing called context informs human speech?
Wittgenstein's Philosophical Investigations is a good place to start if anyone's interested in understanding this issue.
They aren't talking about the semantic information of the word "trump". They explain the methodology for calculating information, and it's per syllable (based on the number of distinct syllables that are part of the language's phonetics). So, for English speakers, 'trump' has exactly 7 bits in it. That exact syllable may or may not exist in another language, but if so the same singly syllabic word "trump" would have a different number of bits to a speaker of that language. Maybe next time RTA?
I think it's you that has missed the point. Syllables have a very loose correlation to information. So great; we can stream out 39bits worth of syllables / second. In what way does that describe how information dense those syllables are? Context matters here.
Jokes aside, I agree that estimating the average absolute information content of a syllable seems pretty absurd.
However, if the primary goal here was to determine whether some languages convey more information per unit time than other languages, I think the authors did fine. To this end, they needn't define information per syllable in anything other than p.d.u. - procedurally defined units. If average Vietnamese speech has 2x the number of syllables/min as German, but it takes the same amount of time to recite War and Peace in both Vietnamese and German, it suggests that both languages convey the same high-level information 'per unit time', but not 'per syllable'.
And basically that's all they did... "We computed the ratio between the number of syllables [in the text passage] and the duration [it took to recite the passage]"
Early on when Information Theory was emerging, there were attempts to measure the bandwidth of consciousness. They reckoned about 18 bits per second or less, which sounds very low.
Tor Norretranders book, The User Illusion, mentions some of the research:
W R Garner and Harold W Lake "The Amount of Information in Absolute Judgements" - Psychological Review 58 (1951) - they attempted to measure people's ability to distinguish stimuli (such as light and sound) in bits. Result: 2.2 to 3.2 bits per second.
W E Hick "On the Rate of Gain of Information" - Quarterly Journal of Experimental Psychology 4 (1952) - this experiment measured how much information a person could pass on if they acted as a link in a communication channel. That is, faced with a series of flashing lights, subjects had to press the right keys. Result: 5.5 bits per second.
Henry Quastler "Studies of Human Channel Capacity" - Information Theory, Proceedings of the Third London Symposium (1956). Measured how many bits of information are expressed by a pianist while pressing keys on a piano. Result: 25 bits per second.
J R Pierce "Symbols, Signals and Noise" (Harper 1961) - used experiments involving letters and symbols. Result: 44 bits per second.
Discussion of the research, Tor Norretranders book, and what the research may have missed here:
> instead of being limited by how quickly we can process information by listening, we’re likely limited by how quickly we can gather our thoughts. That’s because, he says, the average person can listen to audio recordings sped up to about 120%—and still have no problems with comprehension. “It really seems that the bottleneck is in putting the ideas together.”
Glad this paragraph was in the article, clears up their methodology. I wonder if it applies to writing too, or if skilled writers work faster.
The same text, at that. So the text has N bits of information and it was, according to the article, spoken at different speeds per language. So N bits at different speeds per language, exactly the opposite of their claim.
> if you're writing java, you'll be putting out a lot more than that due to how stupidly verbose it is
Being "verbose" means that each letter you type communicates fewer bits of information. If the bottleneck is putting ideas together then you would expect someone writing in a more verbose language to type more letters per minute but still take a similar amount of time to communicate the idea.
In practice most Java programmers are using IDEs with good auto-completion, though, so aren't actually needing to type as many letters as you'd think.
I also think that in writing the bottleneck is in how fast and accurate your hand can move. I would agree that English is faster than Spanish because Spanish is more verbose.
This is really cool. I am working in a related area and I think most of us have assumed that on average, the information rate is 'about the same' for the languages across the world. So it's exciting to see that their results confirm this assumption.
Two qualifying remarks.
1) The 'about the same' is important. Even in their data, there is still quite some variance. They found an average of 39bits, with a stdev of 5. That means that about 1/3 of the data falls outside of the range of 34-44bits.
2) Which brings me to the the uniform information density (UID) hypothesis. According to the UID, the language signal should be pretty smooth wrt how information is spread across it. For many years, the UID was thought to be pretty absolute: Even across a unit like a sentence, it was thought that information will spread pretty evenly. Now, there is an increasing amount of research that shows that esp. in spontaneous spoken language, there is a lot more variance within in the signal, with considerable peaks and troughs spread across longer sequences.
Why did everyone assume it would be the same on average? This seems weird to me.
Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small.
> Why did everyone assume it would be the same on average? This seems weird to me.
(see edit below; but i leave this up; it might be interesting, also)
you mean that even for smaller sequences, the UID holds, right? the assumption was that even for a single sentence, there are a lot of ways to reduce or increase information density so that you get a smoother signal. e.g.: "It is clear that we have to help them to move on.", you could contract it to "it's clear we gotta help them move on" and contract it even further in the actual speech signal ('help'em'). or you could stretch it: "it is clear to us that we definitely have to help them in some way to move on", or alike. the assumption was that such increases / decreases would even be done to 'iron out' the very local peaks and troughs, particularly in speech.
bits: yeah, that took me a while to get used to, as well. the authors used (conditional) entropy as a way to measure information density (which is a good measure in this instance imv). and bits is just per definition the unit that comes out of information theoretical entropy: https://en.wikipedia.org/wiki/Entropy_(information_theory) . btw: while technically possible, i don't think that the comparison in the summary article between 39 bits in language and a xy bit modem is a helpful comparison. bits in the context of entropy are all about occurence and expectation in a given context. bits of a modem/in CS, they represent a low level information content for which we do not check context and expectation.
edit: ah, i realise you are asking why most in our community assumed that this universal rate applied across languages, right?
i guess the intuition was that all of us humans, no matter what language we speak, use the speech signal to transmit and receive information and that all of us have the same cognitive abilities. so the rate at which we convey information should be about the same. sure, there are probably differences according to some factors (spoken vs written language, differences in knowledge between speakers, etc.). but when the only factor that differs is English vs Hausa, esp. in spontaneous spoken language, then the information rate should be about the same.
After a few cocktails, once or twice, I've wondered with friends whether some "fuzzy" information rate constant might be a reference by which our brain understands the passage of time. In other words: if there is a fundamental processing rate of x/time, then theoretically, wouldn't our brains subconsciously use that for all kinds of neat reasons?
And the rate wouldn't have to be the exact same value for each individual, so long as the brain can attune its specific value to other reference points to time in nature.
So here is my own experience. I was avid audio book fan for last 3 years and while ago some guy on reddit told me about how he listen books on Audible using high-speed option like 2.x. I never tried that before last summer since at higher speed speech become incomprehensible for me.
What this guy told me is that it's just take time to adjust to it. So I basically started to listen for books at slightly higher speed. Then I gradually increased it and in a few days I could handle 2.0x speed no problem while listening for really complex fantasy (Malazan Book of the Fallen [1]). After two weeks I could handle 2.5x without a problem.
In the beginning it was harder to comprehend at high speed while walking or crossing the street since I lost attention, but in a few months I could do anything while listening without missing any information or emotions of narrator.
To give an example of how far this can go. This spring I was listening for The Expanse audiobook [2] at 4.0x speed. With some effort I could go even faster for like 5.x in case of these particular books, but obviously can not keep up for long.
I still usually listen books at 2.0-3.0x depend on narrator and quality of audio and this skill dont go away even if I have extended time between books like a month or so.
One thing I'd also like to develop / wish was integrated into audible and the like is silence trimming. Some speakers leave outsized pauses in their narration which can be significantly shortened effectively increasing speed with less distortion.
I have the opposite problem where I have trouble paying attention to an audiobook at 1x. I get bored in between words and my mind wanders making it very difficult to keep track of what is being said (as in I hear individual words but have trouble keeping sentences in memory when everything comes too slow)
I wish I had realized this in university and had been able to somehow record and playback lectures at 2x. I always got so little out of lectures because the information wasn't coming in fast enough for me to process correctly.
Overcast (a podcasting app) has great features to optimize the high speed listening experience. They have variable speed, a great silence trimmer, and a voice boost that makes speech clearer.
Blind people use screenreader software sped up so fast as to be indecipherable to untrained ears. The screenreader can give them near instantaneous feedback about where on the screen they are and what's there when it's so sped up, and with a bit of practice perceiving the sped-up speech imposes no burden at all.
I can tell from experience that when I lay down in my bed with my eyes closed I could comprehend speech at much higher speed than I would while walking on the street. No surprise blind people can handle it better even though I have no clue how exactly it work in relations to the brain.
I was always curios to make actual research / paper on this kind of thing, but as non-scientist I simply have no time to do so. So I happy someone actually doing it.
Side question: I wonder if anyone has actually finished the entire Malazan series. It takes some serious dedication. I would be curious of the story still makes sense to you by the end when listening at that speed.
> Side question: I wonder if anyone has actually finished the entire Malazan series. It takes some serious dedication.
I only finished Malazan Book of the Fallen, first two Tales books and all The Path to Ascendancy books. Also started Forge of Darkness, but was too preoccupied with my life to finish it.
Honestly Esselmont books are just weaker overall. The Path to Ascendancy was much better, but 3rd book is just too rushed.
> I would be curious of the story still makes sense to you by the end when listening at that speed.
Speed have no effect on story at all. Basically after you practice it for a bit you even get every emotion narrator trying to put into his speech.
As for the story in general it's make more and more sense closer you get the the end. It's masterfully crafted world with great theme of compassion and even though I finished it more than a year ago I still have flashback or two from time to time since I loved some of characters. Malazan is certainly one of my favorite book series.
Yet keep in mind there is abundance of information and events as well as unreliable narrators which can confuse your view of story lines.
Why would you want to do that though? Isn't the experience of listening to it the point? If not, why listen to it at all instead of reading a detailed summary?
Because I don't just listen for books for enjoyment of process itself. I love complex stories with hundred of characters and plot lines across many books. Reading something like Malazan or Wheel of Time is like a journey into another world for me and I deeply immersed into these world while exploring them. Yet amount of free time I have is limited so getting more information in short period of time is very convenient
I totally get it when some people just love to read books slowly while enjoying their coffee or looking at nature, but I'm into books for the stories and format of fast-paced audio is fine for me.
> Isn't the experience of listening to it the point? If not, why listen to it at all instead of reading a detailed summary?
I feel like you imply that by listening on high speed I miss some part of experience. Yet other than voices being just slightly distorted (after some practice it's the same voices, but faster) I get exactly the same experience as any person who listen or read unabridged book.
On other side detailed summaries are not the same thing that author designed, but someone else rehearsal which is usually far from perfect.
I'm a bit confused, here. (I went and looked at the original paper.) They estimated information density for each of the subject languages as a whole, on average:
> In parallel, from independently available written corpora in these languages, we estimated each language’s information density (ID) as the syllable conditional entropy to take word-internal syllable-bigram dependencies into account.
But the experiment uses the same text translated into each language! Why introduce this extra variable (and source of error) of estimated language-wide information density, if you are controlling your experiment such that you have the exact same information encoded in each language? That is to say, why use an _estimated_ information density when you could measure it exactly for the texts that are being spoken? Or, conversely, why go to all the trouble of having the speakers read the same text translated into each language, if you aren't going to make use of that symmetry?
Information depends on probability. If something is very probable then it doesn’t have much information (because you already saw it coming). If something is improbable then it has a lot of information.
In the paper they want to know how much information is in a syllable in context. To do that they need to know the probability of each syllable given the previous syllable. To estimate that probability distribution, you need to look at a lot of text, much more than just the passages that the authors used to measure speech rate.
I suppose that the experiment wants to capture the actual 'information density' of the language, and hence looks at the full language. Then, they want to avoid any modification in speech rate due to the semantics of the spoken text.
This does not make sense for a hypothesis where the actual bit-rate of speech tends towards 39 b/s. That is, when your text happens to convey more bits, you slow down.
However, for an alternative hypothesis, this design does make sense. The idea here is that a language naturally converges to a speech-rate that gives 39 b/s. The idea here is that the actual speech-rate is much more constant, and just drops until it becomes too fast. For that, I'd argue you don't want the mean bit-rate but something like the 90th percentile bit-rate. Because it seems to me that speech-rate that is 'too fast' more than 10% of the time would not really be natural.
The researchers obviously have to keep the scope narrow in order to get numbers at all.
That said, we should be aware that a tech nerd audience will find simple answers to complex non-tech questions appealing, and we should not over-estimate our understanding here just because we have a number.
There is a large amount of data transmitted through sub-communication and context, particularly during an in-person interaction, which is what people are wired for.
Overall tone, body language, eye contact, and various social cues make up the bulk of data being transferred in many interactions. There's a reason why talking to some people feels exhausting and others invigorating, and it's not just the transcript.
We can avoid reading too much into the study by just remembering the error bars. It's not like 39 is a universal constant. It's more like 39 with a standard deviation of 6. That's a wide spread, but it's less wide than the spread you get from syllable rate alone, and that's all the study quantitatively tells us.
Some years ago I worked on an accessibility project for an app and website designed for people with disabilities. One of the team members had low vision, and used a screen reader that must have been set to 3x or even higher. I usually listen to YouTube and podcasts at 1.5-2x and I could barely understand the audio. He seemed surprised, which indicated to me that 3x+ was the norm for people in his circle.
I wonder if his ability was trained through years of using fast screen readers, vs. a lower visual processing load leads to better audio processing, or some other explanation.
1219 is a record as far as I know. We measured it explicitly by getting the screen reader to read a passage and dividing. I spent months working up from 800 to do it and lost the skill once I stopped (there was a marked level of decreased comprehension post 1000, but I was able to program there; still, in the end, not worth it). When I try to described the required mental state it comes out very much like I'm on drugs. Most of us who reach 800 or so stay there, though not always that fast for i.e. pleasure reading (I do novels at about 400). it's built up slowly over time, either more or less explicitly. I did it because I was in high school doing muds and got tired of not being able to keep up; it took about 6-8 months of committing to turn the synth faster once a week no matter what, keeping it there and dealing with a day or two of mild headaches. Note that for most blind people these days, total synthesis time per day is around 10+ hours; this stuff replaces the pencil, the novel, etc. Others just seem to naturally do it. You have little choice, it's effectively a 1 dimensional interface, so from time to time you find a reason to bump the knob. And that's enough.
Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.
0: https://ahicks.io/posts/April%202017/rust-struct-field-reord...
To me, the fact that this does in fact seem to be bidirectional at least some is more interesting than that I can listen fast.
Edit: Hah, just saw your post on talking faster to other people who have the same audio skills.
I find that the maximum understandable rate varies a lot between speakers. For some speakers 2.5x is possible, but just 1.5x for others.
One advantage synths has, is that they can more easily control the speed at which words are spoken, and the pauses between words independently. When watching/listening pre-recorded content I often find that I'd want to speed up the pauses more than the words (because speeding up everything until the pauses are sufficiently short make the words intelligible).
If someone knows of a program or algorithm that can play back audio/video using different rates for speech and silence, please share.
If so, have you considered using an EQ plugin to maybe turn down the harsher high frequencies a few notches? Just a thought.
I just don't get the point. If you can process content much faster than it was meant to be played, it doesn't mean you're learning much faster than you could, it means the novel information density is low. Any content that can be sped up that much without loss is not worth listening to in the first place. You're just skipping the trite cliches, filler, and obvious facts.
I can read fast, and I typically go through fluffy NYT bestseller nonfiction at 600 WPM. But when I do this I constantly have a sneaking suspicion that I'm just wasting my time. When I read a good book full of new ideas, I barely go at 150 WPM, but the time always feels well-spent.
This isn't "how fast can I go through this" but "what is a comfortable pace"?
So I bump the speed up, though usually fairly modestly: 1.25x - 1.5x is generally enough.
I've noticed that preferred speeds vary tremendously with the quality of the work and speaker -- high-density information and an exceedingly good speaker, and I'll slow down. Slapdash redundant content and poor speaker, I'll speed up.
The degree of polish in the production matters tremendously. I've listened to CPG Grey's YouTube videos (highly polished) and podcasts (a lot of chit-chat with his co-host). The videos work well at normal speed, or perhaps slighly sped up. The podcasts I find nearly unlistenable, though they improve at much higher speeds (1.75x - 2x).
It is like the infromation doesn't have the time to settle in my memory, despite me understanding it.
It's maybe because when things are slow, I can use the dead time to think about the implication/corner cases of what's being said.
Just spending time in the moment with an enjoyable story is not wasting it.
While I could get my comprehension percentage quite high with a bit of training, I lost all connection to the characters and story, stopped imagining the scenes and felt like the reading the book was a waste of time.
Novels should be read at a natural pace to give room to your imagination and dive into the story. You can still quickly scan over boring/repetitive filler text, but I did that without caring about WPM already.
With other things like textbooks / articles / reports cranking up your WPM and applying your attention more selectively by focusing on or re-reading critical parts is a very helpful skill though.
I 'read' your comment using TTS at 3x. What does that say about the information density of your comment?
(Little to nothing. TTS at that speed is still marginally slower that I normally read with my eyes. Human speech is generally much slower than is necessary to be understood.)
You could just read the plot synopsis or watch the highlights, but sometimes those don't convey build-up, suspense, or other data that are hard to losslessly compress.
Being comfortable with the "boilerplate" of a given medium or genre usually lets you skim or skip it to jump right into the good stuff.
"That's just, like, your opinion, man."
Obviously the people who are getting something out of it, otherwise they wouldn't do it?
"Don't yuck my yum."
I do the same, but with Hacker News comments :)
Deleted Comment
Every couple of months, take a moment to reflect on your comprehension. Is it currently easy for you to understand the audio? If yes, then crank it up a little bit until it's noticeably more difficult. Repeat this process periodically over a year or so and before you know it, it'll be set pretty damn quick.
I think anyone can get to 3x but it takes some time to adjust to faster and faster speeds. It also depends on what you are doing while listening. Distractions or listening while doing something else (driving for example) lowers my ability to comprehend. For example on the interstate without much traffic I'll listen to audiobooks at 3x, but in a city or a crowded highway I have to slow it down.
If it's a technical talk or something I'll still pause often too reflect on what was said, but I can hear full sentences just fine at >3x with my eyes closed.
Consider what you quote: > we’re likely limited by how quickly we can gather our thoughts
Now the amount of relevant info on a screen is typically small enough that a sighted person can zero in on it at a glance and perhaps just click a button without thinking.
I.e. the amount of info that deserves "gathering our thoughts" is typically very small. So if that is the bottle-neck, your colleague can keep cranking up the audio speed until low-level processing audio becomes the bottle-neck, which is a regime that sighted people never deal with even, not even the nerds who speed up their Joe Rogan podcasts.
Deleted Comment
Personally when I did this I feel irritated when I speak because my sped-up audiobooks have conditioned me into thinking I should be speaking at that rate, but it's just not possible for my mouth and tongue to move that fast physically.
The deduction that is quoted does not follow: speeding up audio recordings with 120% results pressing both the auditory system as the language and thought systems (or any other potential bottleneck) to be sped up proportionally since it's a pipeline.
Similarily the posted article (I have yet to read the original one) states in the title that "human speech" has a universal transmission rate, but the research tested reading not speech, so this may or may not be true.
Perhaps the bottleneck is human speech, with the side effect that listening is never trained beyond the typical speech rate limit. (in this case the higher speed syllable languages would be easier to pronounce fast, and the lower speed ones harder to pronounce fast)
Perhaps the bottleneck was in the visual burden of reading, a language that encodes more bits per syllable implies more types of syllables, which irrespective of size or number of characters puts a classification demand on the visual system (classifying a symbol coming from a set of only 2 symbols will be easier, but will require more classification instances than classifying from a large set of characters but with fewer classification instances).
Perhaps the bottleneck was again in speech during reading by subconscious vocalizing of the text.
Perhaps the bottleneck was in the auditory "speech to syllable" classification.
Perhaps the bottleneck was in parsing text.
Perhaps the bottleneck was in "accessing thoughts" etc.
So it is rather hard to identify where the bottleneck is located without having a means of detecting where in the brain the "incoming queue is full" vs "incoming queue is waiting" during speaking, listening, reading. And which of these 3 causes this universal bottleneck (since I gave 2 examples of how an apparent bottleneck in reading could stem from not being trained beyond a possible universal bottleneck in speaking rate...)
There is no shortage of people training to receive a lot of information at once, and 39 bits per second seems to me on the lower end of what some video games require but in terms of constructed, linguistic output? They may be on to something there.
Fast chatters are not faster thinkers. I have yet to see people exchanging thought at a higher rate then usual.
While I'm sure his visual cortex picked up some slack, I'm willing to bet it's mostly just through training. We just aren't trained for faster communication. I've known blind people and they are the same way with their readers.
I also watch most video at 2/3 times the speed since the skills seem transferable.
What do i win???
My theory/experience with this phenomenon is, that a speech synthesizer never makes any errors. When it pronounces a word, it will do so exactly the same way everytime the same word comes up. So the learning effect after a while is a bit higher then when you listen to a human. Humans will always have slight variation in how they pronounce the same word. So, as I understand it, you can "learn" to listen to your speech synthesizer on a fast rate more effectively then you would be able to listen to a fast human speaker.
And yes, I also listen to YouTube talks and audiboosk at about 1.5-2x rate. So I guess 80 bits per second is relatively easily doable for the receiver.
Sometimes it's worth slowing down to 1.5x to give myself a bit of time to process the ideas, though slowing below that sometimes hurts comprehension.
Side note: I find that YouTube in Chrome has the best pitch-preserving time stretching filter, and I've neglected all this time to figure out what exactly they use to accomplish that. I'd love to add that to mpv, if it's not already there.
And yes, I know information theory. It's language that these folks - many of them prominent and celebrated within their utterly normalized professions, just like in the days of phrenology - are fundamentally mistaken about. What quantity of information do you think there is in the word "trump," for instance? Is it the same over time, to bring up just one feature of how this funny thing called context informs human speech?
Wittgenstein's Philosophical Investigations is a good place to start if anyone's interested in understanding this issue.
I think it's you that has missed the point. Syllables have a very loose correlation to information. So great; we can stream out 39bits worth of syllables / second. In what way does that describe how information dense those syllables are? Context matters here.
Jokes aside, I agree that estimating the average absolute information content of a syllable seems pretty absurd.
However, if the primary goal here was to determine whether some languages convey more information per unit time than other languages, I think the authors did fine. To this end, they needn't define information per syllable in anything other than p.d.u. - procedurally defined units. If average Vietnamese speech has 2x the number of syllables/min as German, but it takes the same amount of time to recite War and Peace in both Vietnamese and German, it suggests that both languages convey the same high-level information 'per unit time', but not 'per syllable'.
And basically that's all they did... "We computed the ratio between the number of syllables [in the text passage] and the duration [it took to recite the passage]"
Deleted Comment
You clearly don't know linguistics though because the idea that a word conveys a constant quantity of information is hilarious.
Dead Comment
Tor Norretranders book, The User Illusion, mentions some of the research:
W R Garner and Harold W Lake "The Amount of Information in Absolute Judgements" - Psychological Review 58 (1951) - they attempted to measure people's ability to distinguish stimuli (such as light and sound) in bits. Result: 2.2 to 3.2 bits per second.
W E Hick "On the Rate of Gain of Information" - Quarterly Journal of Experimental Psychology 4 (1952) - this experiment measured how much information a person could pass on if they acted as a link in a communication channel. That is, faced with a series of flashing lights, subjects had to press the right keys. Result: 5.5 bits per second.
Henry Quastler "Studies of Human Channel Capacity" - Information Theory, Proceedings of the Third London Symposium (1956). Measured how many bits of information are expressed by a pianist while pressing keys on a piano. Result: 25 bits per second.
J R Pierce "Symbols, Signals and Noise" (Harper 1961) - used experiments involving letters and symbols. Result: 44 bits per second.
Discussion of the research, Tor Norretranders book, and what the research may have missed here:
http://memebake.blogspot.com/2008/08/straw-dogs-and-bandwidt...
Glad this paragraph was in the article, clears up their methodology. I wonder if it applies to writing too, or if skilled writers work faster.
Being "verbose" means that each letter you type communicates fewer bits of information. If the bottleneck is putting ideas together then you would expect someone writing in a more verbose language to type more letters per minute but still take a similar amount of time to communicate the idea.
In practice most Java programmers are using IDEs with good auto-completion, though, so aren't actually needing to type as many letters as you'd think.
http://ws.apache.org/xmlrpc/apidocs/org/apache/xmlrpc/server...
Two qualifying remarks.
1) The 'about the same' is important. Even in their data, there is still quite some variance. They found an average of 39bits, with a stdev of 5. That means that about 1/3 of the data falls outside of the range of 34-44bits.
2) Which brings me to the the uniform information density (UID) hypothesis. According to the UID, the language signal should be pretty smooth wrt how information is spread across it. For many years, the UID was thought to be pretty absolute: Even across a unit like a sentence, it was thought that information will spread pretty evenly. Now, there is an increasing amount of research that shows that esp. in spontaneous spoken language, there is a lot more variance within in the signal, with considerable peaks and troughs spread across longer sequences.
Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small.
(see edit below; but i leave this up; it might be interesting, also) you mean that even for smaller sequences, the UID holds, right? the assumption was that even for a single sentence, there are a lot of ways to reduce or increase information density so that you get a smoother signal. e.g.: "It is clear that we have to help them to move on.", you could contract it to "it's clear we gotta help them move on" and contract it even further in the actual speech signal ('help'em'). or you could stretch it: "it is clear to us that we definitely have to help them in some way to move on", or alike. the assumption was that such increases / decreases would even be done to 'iron out' the very local peaks and troughs, particularly in speech.
bits: yeah, that took me a while to get used to, as well. the authors used (conditional) entropy as a way to measure information density (which is a good measure in this instance imv). and bits is just per definition the unit that comes out of information theoretical entropy: https://en.wikipedia.org/wiki/Entropy_(information_theory) . btw: while technically possible, i don't think that the comparison in the summary article between 39 bits in language and a xy bit modem is a helpful comparison. bits in the context of entropy are all about occurence and expectation in a given context. bits of a modem/in CS, they represent a low level information content for which we do not check context and expectation.
edit: ah, i realise you are asking why most in our community assumed that this universal rate applied across languages, right?
i guess the intuition was that all of us humans, no matter what language we speak, use the speech signal to transmit and receive information and that all of us have the same cognitive abilities. so the rate at which we convey information should be about the same. sure, there are probably differences according to some factors (spoken vs written language, differences in knowledge between speakers, etc.). but when the only factor that differs is English vs Hausa, esp. in spontaneous spoken language, then the information rate should be about the same.
And the rate wouldn't have to be the exact same value for each individual, so long as the brain can attune its specific value to other reference points to time in nature.
What this guy told me is that it's just take time to adjust to it. So I basically started to listen for books at slightly higher speed. Then I gradually increased it and in a few days I could handle 2.0x speed no problem while listening for really complex fantasy (Malazan Book of the Fallen [1]). After two weeks I could handle 2.5x without a problem.
In the beginning it was harder to comprehend at high speed while walking or crossing the street since I lost attention, but in a few months I could do anything while listening without missing any information or emotions of narrator.
To give an example of how far this can go. This spring I was listening for The Expanse audiobook [2] at 4.0x speed. With some effort I could go even faster for like 5.x in case of these particular books, but obviously can not keep up for long.
I still usually listen books at 2.0-3.0x depend on narrator and quality of audio and this skill dont go away even if I have extended time between books like a month or so.
[1] https://www.audible.com/pd/Reapers-Gale-Audiobook/B00M4LRBY6
[2] https://www.audible.co.uk/pd/Abaddons-Gate-Audiobook/B00T6NZ...
UPD: Edit. s/can keep up/can not keep up/
I have the opposite problem where I have trouble paying attention to an audiobook at 1x. I get bored in between words and my mind wanders making it very difficult to keep track of what is being said (as in I hear individual words but have trouble keeping sentences in memory when everything comes too slow)
I wish I had realized this in university and had been able to somehow record and playback lectures at 2x. I always got so little out of lectures because the information wasn't coming in fast enough for me to process correctly.
I don't really use audible, but if you looking for good audio player on Android here is one that can do this:
https://play.google.com/store/apps/details?id=de.ph1b.audiob...
https://github.com/PaulWoitaschek/Voice
I was always curios to make actual research / paper on this kind of thing, but as non-scientist I simply have no time to do so. So I happy someone actually doing it.
I only finished Malazan Book of the Fallen, first two Tales books and all The Path to Ascendancy books. Also started Forge of Darkness, but was too preoccupied with my life to finish it.
Honestly Esselmont books are just weaker overall. The Path to Ascendancy was much better, but 3rd book is just too rushed.
> I would be curious of the story still makes sense to you by the end when listening at that speed.
Speed have no effect on story at all. Basically after you practice it for a bit you even get every emotion narrator trying to put into his speech.
As for the story in general it's make more and more sense closer you get the the end. It's masterfully crafted world with great theme of compassion and even though I finished it more than a year ago I still have flashback or two from time to time since I loved some of characters. Malazan is certainly one of my favorite book series.
Yet keep in mind there is abundance of information and events as well as unreliable narrators which can confuse your view of story lines.
Malazan quickly became my favourite book series (and I am not even a fan of fantasy). It was hard initially. But it gets better.
However, I think that re-read is a must if you want to fully grasp the whole thing.
I totally get it when some people just love to read books slowly while enjoying their coffee or looking at nature, but I'm into books for the stories and format of fast-paced audio is fine for me.
I feel like you imply that by listening on high speed I miss some part of experience. Yet other than voices being just slightly distorted (after some practice it's the same voices, but faster) I get exactly the same experience as any person who listen or read unabridged book.On other side detailed summaries are not the same thing that author designed, but someone else rehearsal which is usually far from perfect.
> In parallel, from independently available written corpora in these languages, we estimated each language’s information density (ID) as the syllable conditional entropy to take word-internal syllable-bigram dependencies into account.
But the experiment uses the same text translated into each language! Why introduce this extra variable (and source of error) of estimated language-wide information density, if you are controlling your experiment such that you have the exact same information encoded in each language? That is to say, why use an _estimated_ information density when you could measure it exactly for the texts that are being spoken? Or, conversely, why go to all the trouble of having the speakers read the same text translated into each language, if you aren't going to make use of that symmetry?
In the paper they want to know how much information is in a syllable in context. To do that they need to know the probability of each syllable given the previous syllable. To estimate that probability distribution, you need to look at a lot of text, much more than just the passages that the authors used to measure speech rate.
I suppose that the experiment wants to capture the actual 'information density' of the language, and hence looks at the full language. Then, they want to avoid any modification in speech rate due to the semantics of the spoken text.
This does not make sense for a hypothesis where the actual bit-rate of speech tends towards 39 b/s. That is, when your text happens to convey more bits, you slow down.
However, for an alternative hypothesis, this design does make sense. The idea here is that a language naturally converges to a speech-rate that gives 39 b/s. The idea here is that the actual speech-rate is much more constant, and just drops until it becomes too fast. For that, I'd argue you don't want the mean bit-rate but something like the 90th percentile bit-rate. Because it seems to me that speech-rate that is 'too fast' more than 10% of the time would not really be natural.
That said, we should be aware that a tech nerd audience will find simple answers to complex non-tech questions appealing, and we should not over-estimate our understanding here just because we have a number.
There is a large amount of data transmitted through sub-communication and context, particularly during an in-person interaction, which is what people are wired for. Overall tone, body language, eye contact, and various social cues make up the bulk of data being transferred in many interactions. There's a reason why talking to some people feels exhausting and others invigorating, and it's not just the transcript.