Another practical thing is the "exception" if nothing is removed on line 6 in the original algorithm. This also seems needed for the proof but you would not want in production, though the chance of hitting it should be vanishingly small so maybe worth the gamble?
Here is my faithful interpretation of the algorithm. And then a re-interpretation with some "practical" improvements that almost certainly make the provability of the correctness impossible.
func CountUnique(scanner *bufio.Scanner, epsilon float64, delta float64, m int) int {
X := make(map[string]bool)
p := 1.0
thresh := int(math.Ceil((12 / (epsilon * epsilon)) \* math.Log(8*float64(m)/delta)))
for scanner.Scan() {
a := scanner.Text()
delete(X, a)
if rand.Float64() < p {
X[a] = true
}
if len(X) == thresh {
for key := range X {
if rand.Float64() < 0.5 {
delete(X, key)
}
}
p /= 2
if len(X) == thresh {
panic("Error")
}
}
}
return int(float64(len(X)) / p)
} func CountUnique2(scanner *bufio.Scanner, thresh int) int {
//threshold passed in, based on system memory / estimates
X := make(map[string]bool)
p := 1.0
for scanner.Scan() {
a := scanner.Text()
delete(X, a)
if rand.Float64() < p {
X[a] = true
}
if len(X) >= thresh { // >= instead of == and remove the panic below
for key := range X {
if rand.Float64() < 0.5 {
delete(X, key)
}
}
p /= 2
}
}
return int(float64(len(X)) / p)
}I tested it with Shakespeare's work. The actual unique word count is 71,595. With the second algorithm it is interesting to play with the threshold. Here are some examples.
threshold 1000 Mean Absolute Error: 2150.44 Root Mean Squared Error: 2758.33 Standard Deviation: 2732.61
threshold 2000 Mean Absolute Error: 1723.72 Root Mean Squared Error: 2212.74 Standard Deviation: 2199.39
threshold 10000 Mean Absolute Error: 442.76 Root Mean Squared Error: 556.74 Standard Deviation: 555.53
threshold 50000 Mean Absolute Error: 217.28 Root Mean Squared Error: 267.39 Standard Deviation: 262.84
Does it still work if you update m as you go?
In this example, they have the length of the list and choose the threshold to give them a desired margin of error.
An n-dimensional space is just a collection of points, each defined uniquely by a set of n-numbers. The semantic meaning of those numbers doesn't really matter. It might be like actual physical space, but it could just as well be something like "time" and "the price of big macs". We have a bunch of mathematical operations that work well on 2 or 3 dimensional space that correlate nicely with our physical intuitions of 'curvature' and 'holes', and that still work perfectly well in more generalized forms in higher dimensions.
I'm not really sure it's that useful to try and visualize what it means on higher dimensions, to be honest.
This is chiefly to acquaint you, that I have visited the Negro School here in Company with the Revd. Mr. Sturgeon and some others; and had the Children thoroughly examin’d. They appear’d all to have made considerable Progress in Reading for the Time they had respectively been in the School, and most of them answer’d readily and well the Questions of the Catechism; they behav’d very orderly, showd a proper Respect and ready Obedience to the Mistress, and seem’d very attentive to, and a good deal affected by, a serious Exhortation with which Mr. Sturgeon concluded our Visit. I was on the whole much pleas’d, and from what I then saw, have conceiv’d a higher Opinion of the natural Capacities of the black Race, than I had ever before entertained. Their Apprehension seems as quick, their Memory as strong, and their Docility in every Respect equal to that of white Children.1 You will wonder perhaps that I should ever doubt it, and I will not undertake to justify all my Prejudices, nor to account for them. ---
I'm not sure that really that excuses being so racist that he thought _Germans_ weren't sufficiently white for america in his 50s, but he did change his views over time.
During the inflationary period, people were flush with cash, demand increased, there were shortages, everyone raised prices, profits surged, companies hired workers for higher wages, people got more money, etc.. Everyone was mad because they got big raises, which were obviously the result of all their hard work which corporations suddenly saw and appreciated for a large percentage of people all at once and _also_ "greedy corporations" suddenly en masse decided that they no longer wanted to be charitable enterprises and decided to raise prices to steal money from the pockets of hard working americans.
Or, you know, there was a bunch of inflation and wages and prices went up in parallel.
And now money is no longer flowing into the economy, some companies went too far raising prices anticipating more inflation, and now they're losing sales and that's hurting profits, and they're going to end up cutting prices to increase sales and maximize profits again.
Nature is healing.
I think a lot of people have the impression that inflation reduces how much stuff people can afford and generally it's fairly neutral in that respect. There's a certain amount of production and a certain amount of demand and in general it will balance out and no matter what's' going on with inflation people are gonna be able to afford the same amount of stuff. I think people had this idea that if we got inflation under control that suddenly everyone would be able to afford to buy all the stuff they wanted to buy, and they just can't.
The main reason inflation is bad for most people is instability and you have to keep getting raises to keep up with prices. You get a raise, you can suddenly buy some stuff you couldn't before -- prices go up and now you can't again. Then you get a raise, can afford to buy a bunch of stuff, and prices go up and now you can't again. (not to mention that toll it takes on saving, but even that isn't that bad if you own stock instead of holding cash, because asset prices inflate, also)