Complexity is a source of income in open source ecosystems (2019)

I don't have much to say about the R ecosystem, as I am not actively using it or even doing anything using R.

> Matloff is also upset about a commercial company swooping in to steal their precious, a common academic complaint (academics swooping in to steal ideas from commercially developed software is, of course, perfectly respectable).

This reads like some kind of sarcastic statement. In fact it is indeed perfectly respectable, that academics do this, because it lets our society advance and research progress outside of proprietary software gardens. In my opinion any such kind of action is perfectly respectable, if it shares knowledge with the world. The other way around it is not OK, because from proprietarizing ideas, there is no benefit for society. If a company is worth its salt, it will offer services on top of existing public knowledge, without creating walled gardens. If it cannot do so, then it might be time to realize, that the company should not exist.

What is not OK is, if academics take that knowledge and make it seem like they were the ones inventing stuff. One should always give credit where credit is due. Never forget academic honesty. Usually however, a lot of foundational research is done by academics, only to be picked up years later and often by commercial entities dealing in proprietary software.

tfehring · 5 years ago

> If a company is worth its salt, it will offer services on top of existing public knowledge, without creating walled gardens.

I contend that every profitable company leveraging public knowledge does offer services (broadly construed) on top of that knowledge, since otherwise no one would pay them. Maybe the minimal example is that in which the only service provided is marketing, but the range of service levels provided in practice runs the gamut. RStudio clearly goes well beyond that minimum.

In some cases, the services offered are so valuable that the company can create a walled garden - which reduces the net value of those services to the customer but provides the company with monopolistic pricing power. (It’s not clear to me that RStudio’s products have significant lock-in, for what it’s worth. R has significant lock-in, like any programming language, and RStudio benefits from the lack of competitors creating good R tooling, but that’s not really a walled garden of RStudio’s making.)

dls2016 · 5 years ago

Having recently started a Python project, the walled-garden approach seems to be the Anaconda choice.

unishark · 5 years ago

Academics might have their own grand plans for society, but funding agencies generally want industry to take over at some point and run with it. I expect they'd be happy to move on and fund something new rather than trying to compete with products that industry can provide.

Personally I think RStudio is far less clunky than, say Jupyter. I still think Matlab is by far the best tool for engineering. I think having a company supporting the technology like that has generally been very beneficial. It's certainly possible they would just cause it to stagnate and milk it for cash as the technology dies out, but their business would die with it. Hence these examples have never stopped advancing.

zozbot234 · 5 years ago

> Academics might have their own grand plans for society, but funding agencies generally want industry to take over at some point and run with it.

Sure, but people can have different views about where that point should be. Documenting industry-relevant know-how makes it even easier for folks to take the result and run with it (providing the 'impact' that most funders will ultimately care about), so I agree with GP that it's very much a proper field for academic research.

admax88q · 5 years ago

You can't steal the ideas from academia or commercial. You can implement the same idea, but the previous one still exists.

A company commercializing an idea from academia does not make that idea proprietary.

TeMPOraL · 5 years ago

> You can't steal the ideas from academia or commercial. You can implement the same idea, but the previous one still exists.

It's an important point, but only about abuse of the word "steal".

> A company commercializing an idea from academia does not make that idea proprietary.

It doesn't, up until they start armoring it with their own IP until nobody else can reasonably extend the original idea anymore...

kybernetyk · 5 years ago

>because from proprietarizing ideas, there is no benefit for society.

Except for a profit motive for companies which then benefits society by lowering prices of products. If it weren't for private companies no one of us could afford a computer today.

TeMPOraL · 5 years ago

It's competition that keeps prices low - if private companies were to properly proprietarize core ideas around computing, we'd still be stuck with room-sized computers.

Maintaining a market that's beneficial to society is an attempt to carefully tune interplay between many public and private concerns. From the point of view of society, the market is just a roundabout way of putting people on a treadmill and hanging carrots in front of them. We need companies to produce goods, provide services, and innovate better solutions, so we allow a degree of privileges and exclusivity for things that otherwise shouldn't be owned - like ideas. But we cannot allow companies to actually get the carrot and stop running - because it's the production and innovation treadmill that's the important thing (from society's POV).

This model has been so successful that it blinded a generation or two of politicians - they let the donkeys get their carrots, and now we have fat donkeys saying to us, "if you want us to run the treadmill some more, give us the carrots".

rjzzleep · 5 years ago

OpenStack was definitely unnecessarily complex. And it led to many different distributions that were only manageable by hiring a lot of consulting manhours.

touisteur · 5 years ago

Ha ha memories of talking to RH salespeople and every time OpenStack was mentioned they were like "but you don't want that!" errr... OK I guess?

themacguffinman · 5 years ago

> The other way around it is not OK, because from proprietarizing ideas, there is no benefit for society

You say this as if proprietary rules (ie. copyright and patents) weren't intentionally designed to incentivize knowledge creation & sharing by ensuring compensation for innovation. Creators can't pay bills with societal advancements and research progress. This is a really shallow "proprietary = bad" ideological take that I expect to see on the black & white pages of Stallman's website, not an (at least partly) entrepreneurial forum.

TeMPOraL · 5 years ago

(I apologize in advance, but I have to)

tired: IP = bad

wired: IP rules are "intentionally designed to incentivize knowledge creation & sharing by ensuring compensation for innovation"

inspired: above isn't true in practice; IP rules have been thoroughly gamed, and their primary job is protecting monopolistic behavior, even if it means keeping knowledge suppressed past the point it becomes irrelevant or useless

Copyright and patents, as practiced today, have mutated into a malignant form. That doesn't mean the idea is bad, just that it was taken too far - instead of creating idea-generating pump, we've ended up with toolkits for extracting rent from society and creating wasteful artificial scarcities.

zelphirkalt · 5 years ago

I don't think it is as shallow as you make it seem at all.

Whatever technological / knowledge advancement any entity holds proprietary could probably be of more use when being shared with the world. Our technology enables us to have this kind of knowledge society, where knowledge is accessible at all times almost anywhere. I don't think that stopping knowledge from spreading can be a net gain, just because it enables a few selected individuals at some entity to gain from keeping it a secret. Such a thing should not be the basis of a business.

The popularity of R amazes me. I took a one-week class in R and left with a vow to avoid it at all costs. I have never seen a more confusing, hard to understand, inconsistent software product in my 30 years as an IT professional. It's apparently targeted at scientists and sociologists who are non-programmers. I have no idea how they manage to use it.

colechristensen · 5 years ago

Scientific programming is just different. Much of the culture of scientific programming is different and with good reasons not easy to understand.

Its something like how baking cookies at home and running a cookie factory are very different. To a person doing each, the behaviors and priorities of the other seem strange and it’s easy to for one to think “we are both just making cookies, why don’t they do what i do which is obviously superior “

Scientific programmers are solving problems first, not writing programs. They are solving problems in a way that is useful only to them or peers who know a whole lot about the problem being solved. The problem-first tools look strange because they deemphasize the programming niceness in favor of problem niceness.

You find the same sort of confusion when programmers are facing business types and excel usage.

There are certainly times when a piece of code starts needing the programming touch, but the right tool for the job depends on the job.

The_Amp_Walrus · 5 years ago

As someone currently writing scientific programs for scientists, much of the culture of scientific programming is bad and they are having rings run around them by kids who build websites for a living and it's embarassing.

You think web developers are just writing programs for the hell of it? They have operational constraints around their work as well, it's just that there's a rad open online culture of continuous process improvement that's leveling up the tooling and practices that makes what was once hard look easy.

Things scientists using computers can learn from people working in the software industry:

- use a version control oriented workflow

- write extensive tests

- data management, automated data processing/cleaning pipelines, backups

- build generic frameworks

- use continuous integration

- logging and error reporting for long running tasks

- use modern tooling for automating infrastructure and big jobs, rather than manually submitting Slurm jobs via SSH

#notallscientists but the average level of competence is not good and I think the idea that writing software for scientific research is somehow special is backwards and counterproductive.

lwhi · 5 years ago

There's no good reason for something being difficult to understand. Especially a product that's been designed to be general purpose.

You give good reasons for why a certain situation exists in scientific communities, but I see no reason why it has to be that way.

droopyEyelids · 5 years ago

Having worked a bit with candy making and baking from home sized to restaurant sized to a regional factory, I'll say this analogy doesn't make sense.

Almost all the progression in that industry is relatively straight forward and would make sense to the lay person.

908B64B197 · 5 years ago

> Scientific programmers are solving problems first, not writing programs.

And they don't value writing programs. Or Software Engineers.

That makes productizing some research interesting. That also makes trying to get the same result as something published by your own lab a year ago an uphill battle: "it worked on John's old laptop, the huge Alienware. Never worked on any of our machine".

blacktriangle · 5 years ago

My description of R: It makes hard things easy and easy things hard. Ergonomically it is the absolute worst "programming language" I've touched in my life. However somehow it managed to become the official language of statistics research and has packages to do any type of analysis you can dream of.

I think the reason you and I dislike R is because we just work differently than non-programmers. Non programmers think in purely imperative, straight-forward semantics. They write one-off unmaintainable code tying together libraries that solves their immediate problem. Programmers try and write R code as if it was a proper programming language and immediately run into walls. Non-programmers never see the walls because they don't even know there's another way.

dragontamer · 5 years ago

R > Matlab. But both suck as programming languages.

But that's because neither R nor Matlab are primarily programming languages. They're primarily mathematical exploration tools.

bachmeier · 5 years ago

> somehow it managed to become the official language of statistics research

There's zero mystery to this. The intended audience is people that want to get stuff done. Professional software developers commenting on R are like this: "He's such a good salesman. He does everything the right way. He dresses right. He talks right. He has the best smile of any salesman I've ever seen. He fills out his reports on time. Granted, he doesn't make many sales, but why would you hire one of those other guys over someone that's perfect?"

sedeki · 5 years ago

FWIW, R started to make sense to me (as a programmer) once I read Advanced R by Wickham.

mbreese · 5 years ago

I have a colleague that use R as a general purpose language (I'm in science), and it's horrible. He runs into problems all the time and usually the answer is "more RAM".

For the things it's good at, it's great. For everything else I avoid it like the plague. More often I find it easier to use a quick Python script to generate a data table that I can then read into R to perform whatever stats or plots I need. It's almost always faster than if I had just run everything in R to begin with.

But I think your description is spot-on. Non programmers just want to get something done and if it works in R, then great. For those of us that think in terms of software engineering, good practices can be difficult in R.

The vanilla-R vs Tidyverse split has made this all worse too. These are two completely separate dialects of R that while still the same language, are completely different.

It's like the R folks took the "there's more than one way to do it" mentality from perl and said "challenge accepted!".

crispyambulance · 5 years ago

Just like any other programming language, it DOES take a good programmer to make a decent library in R that people can use on their own problems.

R has had a very long evolution. It is a very different beast today in its most common usage, than it was the earlier days 10, 20+ years ago. Even the Tidyverse has some libraries that are very much crafted with a programmer mindset like purrr and tidyr. These tools are decidedly non-imperative and not straight-forward in their semantics.

What makes R difficult for experienced programmers, I think, is the inconsistency of paradigms that are the result of its long history. This complicates how one writes library code.

There is, however, a "sweet-spot" for R and that would be as a "notebook" based programming language much like Mathematica, Matlab, and Julia. Which one you like, I guess, depends on your taste, your own history, and the killer libraries you want to use.

Whenever I have to describe what R is all about to excel jockeys at work, I just say it's "excel on sterioids". I think that's fair (albeit reductive) description. To be honest, I probably would have never learned R if Julia had existed when I started picking up R. I think I would have preferred a more ahistorical language with less "baggage" than R. But it's always worked out for me, so I am sticking to it at least for now.

andyonthewings · 5 years ago

> It makes hard things easy and easy things hard.

People also say it for k8s. And it kind of explains why k8s is creating so many jobs.

haihaibye · 5 years ago

> R: It makes hard things easy and easy things hard

Maybe you'd like this post: http://bioinfomofo.blogspot.com/2014/01/r-hard-things-are-ea...

CapmCrackaWaka · 5 years ago

I was a mathematics major in college, and didn't have much training in programming when I graduated. R was the first language I learned when I started my career as an actuary, and it was a breeze. Things “just work”. Want to add 2 vectors of different dimensions together? R knows what you’re getting at, and makes it work. Comparatively, learning Python was harder.

Now that I’m used to both languages, I find it funny how much R is hated by “true” programmers.

vharuck · 5 years ago

This is the key. R shouldn't be seen as a general programming language, but a domain specific language that's still open-ended. I started with SAS in my job, which was fine for statistics and handling tables. But anything beyond that, even supposedly simple things like reusing code or listing all files in a folder, was not simple. With R, it was.

R only had to be ergonomically better than the competition, and they weren't very good.

blt · 5 years ago

What are you "getting at" by adding two vectors of different dimensions? It's not obvious to me.

Off-by-one dimensionality errors are so common in programming. If the language does something like zero-extending instead of raising an error, it will lead to an "it runs but gives the wrong answer" bug. These are much more painful in numerical code than in logic-based code.

breck · 5 years ago

Have you spent time with the community? The community is fantastic.

Also, the cheat sheets put out by the RStudio team are the best programming language cheat sheets I've seen for any language: https://www.rstudio.com/resources/cheatsheets/

I don't do R much anymore (at the end of the day I personally have the freedom to start fresh so I choose that over dealing with technical debt in the R language itself), but the R Studio product, team, and R community I found fantastic.

froh · 5 years ago

R got a simplicity boost with the "tidyverse", R studio and ggplot, all driven by Hadley Wickham. at its core, R is a very straightforward language. however it never had a benevolent dictator who gave it consistency, elegance and style. Hadley is compensating that a bit.

jklowden · 5 years ago

A lot can be said in favor of R, but "straightforward" is a debatable description. The semantics of R were never really designed, and were only recently "discovered", post hoc. See "Evaluating the Design of the R Language" (http://janvitek.org/pubs/ecoop12.pdf).

asdff · 5 years ago

I'd argue that tidyverse is entirely nonconsistent with the rest of R, though. At least base R packages all operate in an "R-way," so learning this syntax helps you with other packages that others try to write in an "R-way," while tidyverse only operates in a tidyverse way that you can't take your syntax knowledge with you to other packages.

I'd say the learning curve for making a sexy plot is a lot shorter with tidyverse, but overall, relying on it handicaps you versus spending the half hour longer to do the same thing with base graphics (or a base-like package).

Fomite · 5 years ago

I think it's somewhat more complex than that. I think tidyverse-R, which is a quasiseparate language, is only simple with complete buy in, and involves a lot of magic, shorthand, and "These are the symbols I put into the machine to get X back out".

rossdavidh · 5 years ago

I am primarily a python programmer, but I sometimes use R.

You are either going to use a programming language (or library, etc.) made by a programmer pretending they know about statistics, or a statistician pretending they know about programming. Oftentimes, as a programmer, the right choice is the former, but not uncommonly (because statistics is even less intuitive than programming), you really really need to know that the statistics have been done right. If someone has ported the relevant code from R to python, great. If not, bit the bullet and use R, it's where the statisticians hang out.

You know, I bet statisticians don't think any more kindly of how programmers make stuff. Our use of the '=' sign, for example. We're just used to that kind of thing, so it doesn't look like a problem to us.

Communitivity · 5 years ago

R programming is fundamentally different at a conceptual level. You are operating on datasets rather than individual values. Also the GUI mechanism use reactive programming if you are using R Shiny. R is awesome for what it is designed for.

jayd16 · 5 years ago

Yeah, its not so hard to groc. Its just a data driven style all the way down. You get pros and cons. Its great at working on data sets.

That said, I feel like correctness should be given a higher priority in scientific computing and yet a dynamically typed, lazily evaluated language is used.

asdff · 5 years ago

Other than using apply functions instead of loops, coding R is a lot like coding python only you get a lot more of the data science python package functionality already baked into base R. The syntax differences are slight enough where it's pretty easy to move between the two (or find relevant stackoverflow answers instantly to common annoyances). R generally inputs your data and outputs your statistical test results in less code with less headscratching than doing the same in python in my experience. I prefer plotting in R as well.

beforeolives · 5 years ago

I've written some R both for small interactive scripts and running in production, it wasn't the first language I learned - R gets some things done very well; it also has some idiosyncracies, there is stuff that is clearly patched up together and exists for backwards compatibility, and there are many ways to do the same thing in R. If you don't expect it to be perfect, it gets the job done - nothing to write home about and certainly not a language that you should avoid at all costs.

sharadov · 5 years ago

It does it well, if there is a package out there for what you are trying to accomplish and hope that the package works for your use case..

Fomite · 5 years ago

There's a difference, in my mind, between "Programmers" and "Invokers of Code".

R is a terrible programming language.

It's not a bad language for invoking code, because for many of those people, they're not taught the concepts behind any language, so it's all semi-arbitrary symbols.

model <- lm(outcome ~ variable1 + variable2 + variable3, data=data)

summary(model)

Isn't any more complex than anything else. And what R does have is a network effect - at this point, for almost any statistical task I've ever encountered, there's R code for it.

ineedasername · 5 years ago

They can use it because they don't come with preconceived notions of how programming normally works. And it let's them do powerful analysis without much boilerplate code along with a library of packages for an enormous range of analytical methods and visualization, all without having to do much in the way if boilerplate coding.

Sure the syntax is going to seem alien to them, but so would any first encounter with a programming language.

The use case for R simply isn't a traditional programmer. That isn't the target user. Sure if you need an application that might need significant scale you're not going to use R Shiny, but a lot of R work are one-off bespoke analysis projects. Models that do need to be deployed for use at scale in an application take their output parameters from R models and simply implement them in the app. I do this myself, taking coefficients etc, implement a function call in a database and then use the results on the front end.

tarsinge · 5 years ago

I'm a professional programmer and I don't find R hard to understand. Have a look at R for Data Science[0], maybe you'll see why scientists and statisticians find it easy for their analysis and visualizations (and conversely find Pandas+Python very complex).

[0] https://r4ds.had.co.nz

samuel · 5 years ago

That's are my thoughts, mostly, but in the end libraries are the killer feature of successful programming languages. I have got used to it and I'm more proficient now at data analysis tasks using R than Python/pandas thanks to tidyverse+ggplot.

The object system mess though... I have no words.

agumonkey · 5 years ago

it seems to me that the value is in hard-compressed optimized functions and to this crowd this is the most important factor

software engineering practices don't exist unless you have a gigantic program, programming language theory is either not interesting or too foreign for them

about how they manage.. it's easy, they get used to it

jokethrowaway · 5 years ago

Machine learning is ripe with similar examples (tensorflow in primis).

failwhaleshark · 5 years ago

R was the supposedly the "FOSS" replacement for SAS, and Matlab to a degree. I had to support it and bioconductor.

Now, most people just use Python.

bigbillheck · 5 years ago

Why the scare "quotes"?

gravypod · 5 years ago

It's selling point is it's not matlab. It's downfall is it's not python. The python scientific compute libraries have they're problems but people are using them and math people "get them". I think they have an awful design/api from a programming perspective but most people don't care. Matlab is similarly a strange language but all of it's libraries/tools make what scientists do easy. "Click a button and your code can now run on a compute cluster".

R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened).

Fomite · 5 years ago

"R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened)."

People have been saying this since I was an undergraduate.

I'm submitting my tenure packet this year.