I surprised there was no mention in the blog post or the comments so far about the homework factor. It isn't just personal side projects that people are working on over the weekend. I am betting the relative percentage of CS students on the site is also much higher on the weekend. Tags like assembly, pointers, algorithm, recursion, class, and math are all rather vague. Those topics are all discussed at length in CS classes, but if you are working on a real world project in those fields, odds are you will tag it with a more specific technology you are using rather than the abstract theory behind it.
EDIT: On second look, Python, C, and C++ are also the go to languages for CS classes (along with Java but that is also a big enterprise language unlike the other three.) Almost this whole list seems to be schoolwork related.
Yup, seeing the top 10 list made me immediately think of homework assignments.
In the real world, programmers don't take their questions with "recursion" or "pointers" very often. Nor is Assembly a common language for side projects.
This definitely seems like a case of students simply being relatively overrepresented on weekends.
Funny, when I saw the top 10, Edward Kmett came to my mind. He is likely the most prolific Haskell hacker, but he also recently started a side project writing some graphics shaders for VR in C (when he was younger, he was in the demoscene community).
What proportion of people asking questions do you think are students vs professionals/other nonstudents?
Seeing as most people are only students for 4 years of their life, I would be surprised at how much students contribute to this change. I also can't think of a reason as to why students would be asking more questions on weekends vs weekdays.
Especially for the big difference in Haskell, I just can't imagine its largely caused by students since most students will probably only get to use it for one or two courses.
Students are much more likely to ask lots of questions. Far more difficult projects are often due on a Monday or Tuesday to give students the weekend to finish up, the exact kind of projects likely to elicit questions at the last minute.
I would have expected HW to be only a small proportion, but I'd also expect think terms like "pointers, algorithm, recursion" to be much more common for HW.
If Hw, would correlate annually with semesters (noting that SO is internationally popular, and different countries have different schedules).
Also, I'd have thought that with weekend projects you're more likely to find things out by experimentation and reading the documentation than by asking questions. With work or classroom projects, you have to work with a fixed spec, a prescribed technology, all sorts of Best Practices, and deadlines. Weekend projects give you much more freedom in all four respects.
Exactly and plausible Haskell weekend shift explanation may be that it's just difficult language and/or not many people are using it in real projects (during work week).
The funnel shape of the scatter plot immediately reminded me of an article on the insensitivity to sample size pitfall [0], which points out that you'll expect entities with smaller sample sizes to show up more often in the extremes because of the higher variance.
Looks like the tags with the biggest differences exemplify this pretty well.
I also saw that triangle shaped plot and had the same thought. I read a great paper about this recently [0] with some of the same examples as the link in the parent, but going a little further in depth.
I originally got on this topic when reading Bayesian Methods for Hackers [1]. I am still hunting for a good method to correct/compensate for this when I am doing these types of comparisons in my own work.
When I was writing my thesis I wanted to correct for that as well, and weighted my data by the log of the sample size. This made intuitive sense to me, and both my advisors seemed to agree, though neither of us found compelling papers for this.
It really doesn't matter - at least, not for the statistical error the parent is talking about. The effect isn't related to whether we are sampling from a larger population of programmers.
Suppose there were no difference between the usage of each language, and people just program on the weekends vs weekdays with some probability independent of language. Then, if a language has lots of users, it will likely have close to the average weekend/weekday proportion. The fewer users the language has, the more likely that it has an uneven weekend/weekday proportion just by chance. And if you plot the weekend/weekday proportions vs. the number of users, you expect a funnel shape just like the one in the article.
Therefore, the plot in the article - by itself - provides no evidence that there is any difference between the usage of different programming languages.
I think it's just that they are plotting sum vs ratio of two random variables. Try this (in R):
a <- runif(1000); b <- runif(1000); plot(a + b, log(a/b))
Also, they likely have a low-end cutoff (notice their x axis starts at 10^4. If you do the same to the above plot, you get even closer to that exact shape. Try:
It could also be that the most popular languages in the corporate world are a compromise somewhere in between enjoyable/exciting and horrible/boring. (My assumption is that a large weekend ratio correlates with enjoyable/exiting.)
The ratio of sample sizes in the OP also isn't that bad, and none of them are very small.
Can you normalise data like that based on a confidence interval? Just rescaling the graph to unify them seems wrong, (it would answer something like "what do we think the distribution would look like if we distrusted the low end?") but maybe there's a better way?
A confidence interval won't adjust the points (point estimates) but will give those points with a lower sample size wide confidence intervals (often covering zero).
Using an (empirical) Bayesian multilevel model can both attach uncertainty intervals to the point estimates and appropriately "shrink" the estimates towards zero at the low-sample-size end.
The latter is more directly interpretable, at the cost of slightly more complex modelling (/assumptions).
One way I use Stackoverflow’s dev stats is to make educated guesses about the easiness of finding developers in 2-3 years time to maintain now-greenfield projects. Does Ruby seem to go down while Python is in steady growth? Let's move away from Rails. Swift is picking up steam? It's safe to switch from Objective-C. This dataset seems to be just fantastic for that.
In my opinion, that's what's wrong with the world. Chasing tech, pick something solid and stick with it. I have seen developers barely get decent in one language only to drop it and learn a new language. Picking up the basics of a language is easy, but it's knowing the nuisance that separates the professional from the amateur.
> Picking up the basics of a language is easy, but it's knowing the nuisance that separates the professional from the amateur.
I disagree.
I went from C# to Node.js and back to C#. I learned the hell out of JavaScript, and yet I feel like nobody will ever care that I can explain some of its nuances and gotchas. Language nuances are constantly evolving anyway.
Learning Node.js unlocked whole paradigms I didn't understand in C#. I never really "got" functional programming. Switching between the two helped me understand the pros and cons of different approaches. I discovered a lot of the things I miss from JavaScript are actually there in C#, just a little off the beaten path. On the other hand, I can appreciate "the C# way" and breathe a sigh of relief that certain gotchas do not exist in that environment.
It's much easier to be a rockstar developer when the pool of developers is so small. The game of chasing the "best" technology is pretty much a ponzi scheme.
"Picking up the basics of a language is easy, but it's knowing the nuisance that separates the professional from the amateur."
From learning perspective correct. I don't think developers chase new languages, I think it's more, play 'catch up to latest language' to be more employable. You would hope companies, CTOs, lead techs choose languages to solve the unique business problems. Language choice looks more, to me, like taste. Like choosing something ^nice^ from a menu at a restaurant depending on what is palatable.
Company I work in still uses Delphi 7. Professional environments don't want language hipsters, they want developers. Developers can and do adapt to the environment they happen to end up in, be it haskell, c or Delphi. And they can find pleasure at work once they are used to it.
Somewhat related, if you're looking to compare tags from StackOverflow, I made this site[1] a couple years ago to quickly visualize how many questions and answers are out there for given tags.
I use StackOverflow tag count as well as Google Trends and GitHub star count to get a rough feel for how much people are using certain things, such as version control software[2], databases, or view engines in Express[3].
The answer might be as simple as "people tend to work on games on the weekend", either as hobby projects or that professional game developers work weekends more often, skewing the weekend results away from serious enterprise apps. This would explain both the rise in low level languages but also things like OpenGL, Unity3D and Actionscript 3. It doesn't explain Haskell, of course, but I think the Haskell explanation in the article is accurate.
I don't think the number of questions asked correlate with which languages are used the most. My weekends are mainly Java, but I don't need to post on stack overflow because all of my questions have been already addressed.
Yeah, this is basically what I came to say. Haskell's search volume could be because it's more difficult to use than other languages, or is more poorly documented. It would certainly explain Sharepoint being so high during the work week, having struggled with Sharepoint Configuration Mountain before.
To be honest I find Haskell as one of the better documented languages. Hoogle + Hackage module documentation and being able to look straight at the source code is great.
I guess newbies will find some of the compile messages hard to understand compared to more standard languages, however.
I don't see how this would matter. The datapoint is tag frequency on weekends for a given language compared to tag frequency on weekdays for the same language. It isn't using absolute numbers nor is it comparing cross-language.
> I don't think the number of questions asked correlate with which languages are used the most.
That's why the post works in ratios of weekend to weekday. Regardless of how much more or less likely someone is to ask about one language than another language (e.g. because they're already familiar with it), the numbers in the OP should still indicate relative usage amounts at different times.
...which means quite a few Saturday mornings in Asia have been counted as weekdays and many late Friday nights in the Americas have been counted as weekends.
It would be great if StackOverflow had information on the local timezone that the question was asked in. Seeing Mon-Fri 9-5 vs other times would be interesting.
If Stack Overflow doesn't retain the local time zone for posts, they could just use the subset of UTC weekend hours that don't overlap business hours in any time zone.
I can see room for lots of false assumptions when reading this data.
What if Haskell never changes the rate at which it is discussed - but all the entry programers doing the 9-5 job go away on the weekends helping Haskell to be "louder"? What if the people with homework ask more on the weekend than during the week?
What if certain developers don't post questions tagging a language - but rather tagging an algorithm knowing they can implement it in whatever language they need?
EDIT: On second look, Python, C, and C++ are also the go to languages for CS classes (along with Java but that is also a big enterprise language unlike the other three.) Almost this whole list seems to be schoolwork related.
In the real world, programmers don't take their questions with "recursion" or "pointers" very often. Nor is Assembly a common language for side projects.
This definitely seems like a case of students simply being relatively overrepresented on weekends.
So yeah, some people are weird that way..
And so is Haskell.
But I doubt Unity 3D is schoolwork related.
Seeing as most people are only students for 4 years of their life, I would be surprised at how much students contribute to this change. I also can't think of a reason as to why students would be asking more questions on weekends vs weekdays.
Especially for the big difference in Haskell, I just can't imagine its largely caused by students since most students will probably only get to use it for one or two courses.
If Hw, would correlate annually with semesters (noting that SO is internationally popular, and different countries have different schedules).
Also, I'd have thought that with weekend projects you're more likely to find things out by experimentation and reading the documentation than by asking questions. With work or classroom projects, you have to work with a fixed spec, a prescribed technology, all sorts of Best Practices, and deadlines. Weekend projects give you much more freedom in all four respects.
So yes, GitHub would be much more accurate.
I know I'm writing C++ every weekend (and day, night, morning, afternoon.....)
Looks like the tags with the biggest differences exemplify this pretty well.
[0]- http://dataremixed.com/2015/01/avoiding-data-pitfalls-part-2...
I originally got on this topic when reading Bayesian Methods for Hackers [1]. I am still hunting for a good method to correct/compensate for this when I am doing these types of comparisons in my own work.
[0] -http://faculty.cord.edu/andersod/MostDangerousEquation.pdf
[1] - https://github.com/CamDavidsonPilon/Probabilistic-Programmin...
Is the question interpreted as extending to those not on stackoverflow, or is it a complete census of the 'population' of their data?
Suppose there were no difference between the usage of each language, and people just program on the weekends vs weekdays with some probability independent of language. Then, if a language has lots of users, it will likely have close to the average weekend/weekday proportion. The fewer users the language has, the more likely that it has an uneven weekend/weekday proportion just by chance. And if you plot the weekend/weekday proportions vs. the number of users, you expect a funnel shape just like the one in the article.
Therefore, the plot in the article - by itself - provides no evidence that there is any difference between the usage of different programming languages.
a <- runif(1000); b <- runif(1000); plot(a + b, log(a/b))
Also, they likely have a low-end cutoff (notice their x axis starts at 10^4. If you do the same to the above plot, you get even closer to that exact shape. Try:
plot(a + b, log(a/b), xlim=quantile(a + b, probs=c(0.2, 1)))
plot(a + b, log(a/b), xlim=c(1, 2), log="x")
Deleted Comment
The ratio of sample sizes in the OP also isn't that bad, and none of them are very small.
Using an (empirical) Bayesian multilevel model can both attach uncertainty intervals to the point estimates and appropriately "shrink" the estimates towards zero at the low-sample-size end.
The latter is more directly interpretable, at the cost of slightly more complex modelling (/assumptions).
Instead you'd want to use a CDF that bins that values.
I disagree.
I went from C# to Node.js and back to C#. I learned the hell out of JavaScript, and yet I feel like nobody will ever care that I can explain some of its nuances and gotchas. Language nuances are constantly evolving anyway.
Learning Node.js unlocked whole paradigms I didn't understand in C#. I never really "got" functional programming. Switching between the two helped me understand the pros and cons of different approaches. I discovered a lot of the things I miss from JavaScript are actually there in C#, just a little off the beaten path. On the other hand, I can appreciate "the C# way" and breathe a sigh of relief that certain gotchas do not exist in that environment.
From learning perspective correct. I don't think developers chase new languages, I think it's more, play 'catch up to latest language' to be more employable. You would hope companies, CTOs, lead techs choose languages to solve the unique business problems. Language choice looks more, to me, like taste. Like choosing something ^nice^ from a menu at a restaurant depending on what is palatable.
Deleted Comment
I use StackOverflow tag count as well as Google Trends and GitHub star count to get a rough feel for how much people are using certain things, such as version control software[2], databases, or view engines in Express[3].
[1] - http://www.arepeopletalkingaboutit.com/ [2] - http://www.arepeopletalkingaboutit.com/tags/cvs,svn,git,perf... [3] - http://www.arepeopletalkingaboutit.com/tags/ejs,pug
I guess newbies will find some of the compile messages hard to understand compared to more standard languages, however.
What really makes sense. SO format does not suit Haskell very well.
That's why the post works in ratios of weekend to weekday. Regardless of how much more or less likely someone is to ask about one language than another language (e.g. because they're already familiar with it), the numbers in the OP should still indicate relative usage amounts at different times.
...which means quite a few Saturday mornings in Asia have been counted as weekdays and many late Friday nights in the Americas have been counted as weekends.
It would be great if StackOverflow had information on the local timezone that the question was asked in. Seeing Mon-Fri 9-5 vs other times would be interesting.
What if Haskell never changes the rate at which it is discussed - but all the entry programers doing the 9-5 job go away on the weekends helping Haskell to be "louder"? What if the people with homework ask more on the weekend than during the week?
What if certain developers don't post questions tagging a language - but rather tagging an algorithm knowing they can implement it in whatever language they need?
What if Haskell only works on the weekend?
A few years ago the answer would have been, "the rest of the week it compiles"