Readit News logoReadit News
zelphirkalt · 5 years ago
I don't have much to say about the R ecosystem, as I am not actively using it or even doing anything using R.

> Matloff is also upset about a commercial company swooping in to steal their precious, a common academic complaint (academics swooping in to steal ideas from commercially developed software is, of course, perfectly respectable).

This reads like some kind of sarcastic statement. In fact it is indeed perfectly respectable, that academics do this, because it lets our society advance and research progress outside of proprietary software gardens. In my opinion any such kind of action is perfectly respectable, if it shares knowledge with the world. The other way around it is not OK, because from proprietarizing ideas, there is no benefit for society. If a company is worth its salt, it will offer services on top of existing public knowledge, without creating walled gardens. If it cannot do so, then it might be time to realize, that the company should not exist.

What is not OK is, if academics take that knowledge and make it seem like they were the ones inventing stuff. One should always give credit where credit is due. Never forget academic honesty. Usually however, a lot of foundational research is done by academics, only to be picked up years later and often by commercial entities dealing in proprietary software.

tfehring · 5 years ago
> If a company is worth its salt, it will offer services on top of existing public knowledge, without creating walled gardens.

I contend that every profitable company leveraging public knowledge does offer services (broadly construed) on top of that knowledge, since otherwise no one would pay them. Maybe the minimal example is that in which the only service provided is marketing, but the range of service levels provided in practice runs the gamut. RStudio clearly goes well beyond that minimum.

In some cases, the services offered are so valuable that the company can create a walled garden - which reduces the net value of those services to the customer but provides the company with monopolistic pricing power. (It’s not clear to me that RStudio’s products have significant lock-in, for what it’s worth. R has significant lock-in, like any programming language, and RStudio benefits from the lack of competitors creating good R tooling, but that’s not really a walled garden of RStudio’s making.)

dls2016 · 5 years ago
Having recently started a Python project, the walled-garden approach seems to be the Anaconda choice.
unishark · 5 years ago
Academics might have their own grand plans for society, but funding agencies generally want industry to take over at some point and run with it. I expect they'd be happy to move on and fund something new rather than trying to compete with products that industry can provide.

Personally I think RStudio is far less clunky than, say Jupyter. I still think Matlab is by far the best tool for engineering. I think having a company supporting the technology like that has generally been very beneficial. It's certainly possible they would just cause it to stagnate and milk it for cash as the technology dies out, but their business would die with it. Hence these examples have never stopped advancing.

zozbot234 · 5 years ago
> Academics might have their own grand plans for society, but funding agencies generally want industry to take over at some point and run with it.

Sure, but people can have different views about where that point should be. Documenting industry-relevant know-how makes it even easier for folks to take the result and run with it (providing the 'impact' that most funders will ultimately care about), so I agree with GP that it's very much a proper field for academic research.

admax88q · 5 years ago
You can't steal the ideas from academia or commercial. You can implement the same idea, but the previous one still exists.

A company commercializing an idea from academia does not make that idea proprietary.

TeMPOraL · 5 years ago
> You can't steal the ideas from academia or commercial. You can implement the same idea, but the previous one still exists.

It's an important point, but only about abuse of the word "steal".

> A company commercializing an idea from academia does not make that idea proprietary.

It doesn't, up until they start armoring it with their own IP until nobody else can reasonably extend the original idea anymore...

kybernetyk · 5 years ago
>because from proprietarizing ideas, there is no benefit for society.

Except for a profit motive for companies which then benefits society by lowering prices of products. If it weren't for private companies no one of us could afford a computer today.

TeMPOraL · 5 years ago
It's competition that keeps prices low - if private companies were to properly proprietarize core ideas around computing, we'd still be stuck with room-sized computers.

Maintaining a market that's beneficial to society is an attempt to carefully tune interplay between many public and private concerns. From the point of view of society, the market is just a roundabout way of putting people on a treadmill and hanging carrots in front of them. We need companies to produce goods, provide services, and innovate better solutions, so we allow a degree of privileges and exclusivity for things that otherwise shouldn't be owned - like ideas. But we cannot allow companies to actually get the carrot and stop running - because it's the production and innovation treadmill that's the important thing (from society's POV).

This model has been so successful that it blinded a generation or two of politicians - they let the donkeys get their carrots, and now we have fat donkeys saying to us, "if you want us to run the treadmill some more, give us the carrots".

rjzzleep · 5 years ago
OpenStack was definitely unnecessarily complex. And it led to many different distributions that were only manageable by hiring a lot of consulting manhours.
touisteur · 5 years ago
Ha ha memories of talking to RH salespeople and every time OpenStack was mentioned they were like "but you don't want that!" errr... OK I guess?
themacguffinman · 5 years ago
> The other way around it is not OK, because from proprietarizing ideas, there is no benefit for society

You say this as if proprietary rules (ie. copyright and patents) weren't intentionally designed to incentivize knowledge creation & sharing by ensuring compensation for innovation. Creators can't pay bills with societal advancements and research progress. This is a really shallow "proprietary = bad" ideological take that I expect to see on the black & white pages of Stallman's website, not an (at least partly) entrepreneurial forum.

TeMPOraL · 5 years ago
(I apologize in advance, but I have to)

tired: IP = bad

wired: IP rules are "intentionally designed to incentivize knowledge creation & sharing by ensuring compensation for innovation"

inspired: above isn't true in practice; IP rules have been thoroughly gamed, and their primary job is protecting monopolistic behavior, even if it means keeping knowledge suppressed past the point it becomes irrelevant or useless

Copyright and patents, as practiced today, have mutated into a malignant form. That doesn't mean the idea is bad, just that it was taken too far - instead of creating idea-generating pump, we've ended up with toolkits for extracting rent from society and creating wasteful artificial scarcities.

zelphirkalt · 5 years ago
I don't think it is as shallow as you make it seem at all.

Whatever technological / knowledge advancement any entity holds proprietary could probably be of more use when being shared with the world. Our technology enables us to have this kind of knowledge society, where knowledge is accessible at all times almost anywhere. I don't think that stopping knowledge from spreading can be a net gain, just because it enables a few selected individuals at some entity to gain from keeping it a secret. Such a thing should not be the basis of a business.

jmcphers · 5 years ago
> The wave is subsiding and they now need to appear to have a viable business (so they can be sold to a bigger fish), which means there has to be a visible market they can sell into.

Might need a (2019) tag. As of 2020, RStudio is a Public Benefit Corporation and has a corporate structure that is designed for long-term investment in the scientific community, and isn't susceptible to buyout or IPO.

https://blog.rstudio.com/2020/01/29/rstudio-pbc/

(disclaimer: I work at RStudio)

shkkmo · 5 years ago
It's great that RStudio is a B-corp, but that, by itself, isi a magic bullet.

Is there any information you can share on what steps RStudio has taken to reduce susceptibility to buyout and IPO pressure from VC?

jmcphers · 5 years ago
Absolutely. The B-corp by itself is indeed not a magic bullet, but the PBC transition (which is separate) does make a big difference. In a typical C corp the corporation is ultimately answerable to shareholders and it is shareholder pressure that causes buyouts and IPOs as shareholders need a return on their investment. A PBC is able to prioritize its mission (in our case, enabling access to scientific tools to everyone, regardless of means) over these demands.
throwawayboise · 5 years ago
The popularity of R amazes me. I took a one-week class in R and left with a vow to avoid it at all costs. I have never seen a more confusing, hard to understand, inconsistent software product in my 30 years as an IT professional. It's apparently targeted at scientists and sociologists who are non-programmers. I have no idea how they manage to use it.
colechristensen · 5 years ago
Scientific programming is just different. Much of the culture of scientific programming is different and with good reasons not easy to understand.

Its something like how baking cookies at home and running a cookie factory are very different. To a person doing each, the behaviors and priorities of the other seem strange and it’s easy to for one to think “we are both just making cookies, why don’t they do what i do which is obviously superior “

Scientific programmers are solving problems first, not writing programs. They are solving problems in a way that is useful only to them or peers who know a whole lot about the problem being solved. The problem-first tools look strange because they deemphasize the programming niceness in favor of problem niceness.

You find the same sort of confusion when programmers are facing business types and excel usage.

There are certainly times when a piece of code starts needing the programming touch, but the right tool for the job depends on the job.

The_Amp_Walrus · 5 years ago
As someone currently writing scientific programs for scientists, much of the culture of scientific programming is bad and they are having rings run around them by kids who build websites for a living and it's embarassing.

You think web developers are just writing programs for the hell of it? They have operational constraints around their work as well, it's just that there's a rad open online culture of continuous process improvement that's leveling up the tooling and practices that makes what was once hard look easy.

Things scientists using computers can learn from people working in the software industry:

- use a version control oriented workflow

- write extensive tests

- data management, automated data processing/cleaning pipelines, backups

- build generic frameworks

- use continuous integration

- logging and error reporting for long running tasks

- use modern tooling for automating infrastructure and big jobs, rather than manually submitting Slurm jobs via SSH

#notallscientists but the average level of competence is not good and I think the idea that writing software for scientific research is somehow special is backwards and counterproductive.

lwhi · 5 years ago
There's no good reason for something being difficult to understand. Especially a product that's been designed to be general purpose.

You give good reasons for why a certain situation exists in scientific communities, but I see no reason why it has to be that way.

droopyEyelids · 5 years ago
Having worked a bit with candy making and baking from home sized to restaurant sized to a regional factory, I'll say this analogy doesn't make sense.

Almost all the progression in that industry is relatively straight forward and would make sense to the lay person.

908B64B197 · 5 years ago
> Scientific programmers are solving problems first, not writing programs.

And they don't value writing programs. Or Software Engineers.

That makes productizing some research interesting. That also makes trying to get the same result as something published by your own lab a year ago an uphill battle: "it worked on John's old laptop, the huge Alienware. Never worked on any of our machine".

blacktriangle · 5 years ago
My description of R: It makes hard things easy and easy things hard. Ergonomically it is the absolute worst "programming language" I've touched in my life. However somehow it managed to become the official language of statistics research and has packages to do any type of analysis you can dream of.

I think the reason you and I dislike R is because we just work differently than non-programmers. Non programmers think in purely imperative, straight-forward semantics. They write one-off unmaintainable code tying together libraries that solves their immediate problem. Programmers try and write R code as if it was a proper programming language and immediately run into walls. Non-programmers never see the walls because they don't even know there's another way.

dragontamer · 5 years ago
R > Matlab. But both suck as programming languages.

But that's because neither R nor Matlab are primarily programming languages. They're primarily mathematical exploration tools.

bachmeier · 5 years ago
> somehow it managed to become the official language of statistics research

There's zero mystery to this. The intended audience is people that want to get stuff done. Professional software developers commenting on R are like this: "He's such a good salesman. He does everything the right way. He dresses right. He talks right. He has the best smile of any salesman I've ever seen. He fills out his reports on time. Granted, he doesn't make many sales, but why would you hire one of those other guys over someone that's perfect?"

sedeki · 5 years ago
FWIW, R started to make sense to me (as a programmer) once I read Advanced R by Wickham.
mbreese · 5 years ago
I have a colleague that use R as a general purpose language (I'm in science), and it's horrible. He runs into problems all the time and usually the answer is "more RAM".

For the things it's good at, it's great. For everything else I avoid it like the plague. More often I find it easier to use a quick Python script to generate a data table that I can then read into R to perform whatever stats or plots I need. It's almost always faster than if I had just run everything in R to begin with.

But I think your description is spot-on. Non programmers just want to get something done and if it works in R, then great. For those of us that think in terms of software engineering, good practices can be difficult in R.

The vanilla-R vs Tidyverse split has made this all worse too. These are two completely separate dialects of R that while still the same language, are completely different.

It's like the R folks took the "there's more than one way to do it" mentality from perl and said "challenge accepted!".

crispyambulance · 5 years ago
Just like any other programming language, it DOES take a good programmer to make a decent library in R that people can use on their own problems.

R has had a very long evolution. It is a very different beast today in its most common usage, than it was the earlier days 10, 20+ years ago. Even the Tidyverse has some libraries that are very much crafted with a programmer mindset like purrr and tidyr. These tools are decidedly non-imperative and not straight-forward in their semantics.

What makes R difficult for experienced programmers, I think, is the inconsistency of paradigms that are the result of its long history. This complicates how one writes library code.

There is, however, a "sweet-spot" for R and that would be as a "notebook" based programming language much like Mathematica, Matlab, and Julia. Which one you like, I guess, depends on your taste, your own history, and the killer libraries you want to use.

Whenever I have to describe what R is all about to excel jockeys at work, I just say it's "excel on sterioids". I think that's fair (albeit reductive) description. To be honest, I probably would have never learned R if Julia had existed when I started picking up R. I think I would have preferred a more ahistorical language with less "baggage" than R. But it's always worked out for me, so I am sticking to it at least for now.

andyonthewings · 5 years ago
> It makes hard things easy and easy things hard.

People also say it for k8s. And it kind of explains why k8s is creating so many jobs.

haihaibye · 5 years ago
> R: It makes hard things easy and easy things hard

Maybe you'd like this post: http://bioinfomofo.blogspot.com/2014/01/r-hard-things-are-ea...

CapmCrackaWaka · 5 years ago
I was a mathematics major in college, and didn't have much training in programming when I graduated. R was the first language I learned when I started my career as an actuary, and it was a breeze. Things “just work”. Want to add 2 vectors of different dimensions together? R knows what you’re getting at, and makes it work. Comparatively, learning Python was harder.

Now that I’m used to both languages, I find it funny how much R is hated by “true” programmers.

vharuck · 5 years ago
This is the key. R shouldn't be seen as a general programming language, but a domain specific language that's still open-ended. I started with SAS in my job, which was fine for statistics and handling tables. But anything beyond that, even supposedly simple things like reusing code or listing all files in a folder, was not simple. With R, it was.

R only had to be ergonomically better than the competition, and they weren't very good.

blt · 5 years ago
What are you "getting at" by adding two vectors of different dimensions? It's not obvious to me.

Off-by-one dimensionality errors are so common in programming. If the language does something like zero-extending instead of raising an error, it will lead to an "it runs but gives the wrong answer" bug. These are much more painful in numerical code than in logic-based code.

breck · 5 years ago
Have you spent time with the community? The community is fantastic.

Also, the cheat sheets put out by the RStudio team are the best programming language cheat sheets I've seen for any language: https://www.rstudio.com/resources/cheatsheets/

I don't do R much anymore (at the end of the day I personally have the freedom to start fresh so I choose that over dealing with technical debt in the R language itself), but the R Studio product, team, and R community I found fantastic.

froh · 5 years ago
R got a simplicity boost with the "tidyverse", R studio and ggplot, all driven by Hadley Wickham. at its core, R is a very straightforward language. however it never had a benevolent dictator who gave it consistency, elegance and style. Hadley is compensating that a bit.
jklowden · 5 years ago
A lot can be said in favor of R, but "straightforward" is a debatable description. The semantics of R were never really designed, and were only recently "discovered", post hoc. See "Evaluating the Design of the R Language" (http://janvitek.org/pubs/ecoop12.pdf).
asdff · 5 years ago
I'd argue that tidyverse is entirely nonconsistent with the rest of R, though. At least base R packages all operate in an "R-way," so learning this syntax helps you with other packages that others try to write in an "R-way," while tidyverse only operates in a tidyverse way that you can't take your syntax knowledge with you to other packages.

I'd say the learning curve for making a sexy plot is a lot shorter with tidyverse, but overall, relying on it handicaps you versus spending the half hour longer to do the same thing with base graphics (or a base-like package).

Fomite · 5 years ago
I think it's somewhat more complex than that. I think tidyverse-R, which is a quasiseparate language, is only simple with complete buy in, and involves a lot of magic, shorthand, and "These are the symbols I put into the machine to get X back out".
rossdavidh · 5 years ago
I am primarily a python programmer, but I sometimes use R.

You are either going to use a programming language (or library, etc.) made by a programmer pretending they know about statistics, or a statistician pretending they know about programming. Oftentimes, as a programmer, the right choice is the former, but not uncommonly (because statistics is even less intuitive than programming), you really really need to know that the statistics have been done right. If someone has ported the relevant code from R to python, great. If not, bit the bullet and use R, it's where the statisticians hang out.

You know, I bet statisticians don't think any more kindly of how programmers make stuff. Our use of the '=' sign, for example. We're just used to that kind of thing, so it doesn't look like a problem to us.

Communitivity · 5 years ago
R programming is fundamentally different at a conceptual level. You are operating on datasets rather than individual values. Also the GUI mechanism use reactive programming if you are using R Shiny. R is awesome for what it is designed for.
jayd16 · 5 years ago
Yeah, its not so hard to groc. Its just a data driven style all the way down. You get pros and cons. Its great at working on data sets.

That said, I feel like correctness should be given a higher priority in scientific computing and yet a dynamically typed, lazily evaluated language is used.

asdff · 5 years ago
Other than using apply functions instead of loops, coding R is a lot like coding python only you get a lot more of the data science python package functionality already baked into base R. The syntax differences are slight enough where it's pretty easy to move between the two (or find relevant stackoverflow answers instantly to common annoyances). R generally inputs your data and outputs your statistical test results in less code with less headscratching than doing the same in python in my experience. I prefer plotting in R as well.
beforeolives · 5 years ago
I've written some R both for small interactive scripts and running in production, it wasn't the first language I learned - R gets some things done very well; it also has some idiosyncracies, there is stuff that is clearly patched up together and exists for backwards compatibility, and there are many ways to do the same thing in R. If you don't expect it to be perfect, it gets the job done - nothing to write home about and certainly not a language that you should avoid at all costs.
sharadov · 5 years ago
It does it well, if there is a package out there for what you are trying to accomplish and hope that the package works for your use case..
Fomite · 5 years ago
There's a difference, in my mind, between "Programmers" and "Invokers of Code".

R is a terrible programming language.

It's not a bad language for invoking code, because for many of those people, they're not taught the concepts behind any language, so it's all semi-arbitrary symbols.

model <- lm(outcome ~ variable1 + variable2 + variable3, data=data)

summary(model)

Isn't any more complex than anything else. And what R does have is a network effect - at this point, for almost any statistical task I've ever encountered, there's R code for it.

ineedasername · 5 years ago
They can use it because they don't come with preconceived notions of how programming normally works. And it let's them do powerful analysis without much boilerplate code along with a library of packages for an enormous range of analytical methods and visualization, all without having to do much in the way if boilerplate coding.

Sure the syntax is going to seem alien to them, but so would any first encounter with a programming language.

The use case for R simply isn't a traditional programmer. That isn't the target user. Sure if you need an application that might need significant scale you're not going to use R Shiny, but a lot of R work are one-off bespoke analysis projects. Models that do need to be deployed for use at scale in an application take their output parameters from R models and simply implement them in the app. I do this myself, taking coefficients etc, implement a function call in a database and then use the results on the front end.

tarsinge · 5 years ago
I'm a professional programmer and I don't find R hard to understand. Have a look at R for Data Science[0], maybe you'll see why scientists and statisticians find it easy for their analysis and visualizations (and conversely find Pandas+Python very complex).

[0] https://r4ds.had.co.nz

samuel · 5 years ago
That's are my thoughts, mostly, but in the end libraries are the killer feature of successful programming languages. I have got used to it and I'm more proficient now at data analysis tasks using R than Python/pandas thanks to tidyverse+ggplot.

The object system mess though... I have no words.

agumonkey · 5 years ago
it seems to me that the value is in hard-compressed optimized functions and to this crowd this is the most important factor

software engineering practices don't exist unless you have a gigantic program, programming language theory is either not interesting or too foreign for them

about how they manage.. it's easy, they get used to it

jokethrowaway · 5 years ago
Machine learning is ripe with similar examples (tensorflow in primis).
failwhaleshark · 5 years ago
R was the supposedly the "FOSS" replacement for SAS, and Matlab to a degree. I had to support it and bioconductor.

Now, most people just use Python.

bigbillheck · 5 years ago
Why the scare "quotes"?
gravypod · 5 years ago
It's selling point is it's not matlab. It's downfall is it's not python. The python scientific compute libraries have they're problems but people are using them and math people "get them". I think they have an awful design/api from a programming perspective but most people don't care. Matlab is similarly a strange language but all of it's libraries/tools make what scientists do easy. "Click a button and your code can now run on a compute cluster".

R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened).

Fomite · 5 years ago
"R is very popular with stats people but new PhD candidates are beginning to write python implementations of R things (sort of how like DataFrames/pandas happened)."

People have been saying this since I was an undergraduate.

I'm submitting my tenure packet this year.

archibaldJ · 5 years ago
In mathematics we roughly have two ways to discovering new things: one is the theoretical approach (eg Galois and his theory that proves the none-solutions for fifth degree polynomial equations) and the other is the more adhoc technique-driven approaches (e.g. you can see the ingenuity of that in many of Erdos proofs). Of course the categorization is not always so clear-cut you often get a mixed of both worlds.

I think it’s very similar when it comes to writing code. The more theoretical driven approach, the more structures you’ve got at your disposal to reason about things (eg see Mochizuki’s proof of the abc conjecture).

For example, should you use monad in a small project? Maybe not. But if you are dealing with thousands lines of code every week you may find that if you reimplement some components as monads then suddenly refactoring gets easier and you can extend things more easily without having to do a global search and modify all the occurrences of a certain thing every time you make changes to the type, etc

So ultimately you are offloading cognitive efforts to structures, which are really just constructs to optimize cost-to-transform (though it may increase the cost-to-execute (both computationally as well as mental-visualization/simulation-wise))

So is it worth the effort to work with structures? It really depends on the project you are working on or what you see yourself building in the next 10 years

Depending on how you see your career it’s always good to be a bit more ambitious

Deleted Comment

sebastialonso · 5 years ago
you know..."it depends" if a famous (non) answer among (good) software architecture
hnrj95 · 5 years ago
has mochizuki’s proof been formally recognized? i thought it wasn’t yet agreed upon
jacobolus · 5 years ago
No, Mochizuki’s “proof” is likely to be fatally flawed. He has almost entirely refused to engage with careful and targeted criticism by the experts who found what they claim is a serious flaw.

Mochizuki finally went ahead and published his work in a journal where he is the editor in chief. So much for peer review.

Deleted Comment

dash2 · 5 years ago
In general this idea may be true, but for the tidyverse it's just a load of horse.

Hadley Wickham is a very talented developer, and what he's particularly talented at is writing interfaces that are easy for people to use. dplyr, a key part of the tidyverse, is a great example of that. It breaks data tidying up into a few simple steps that can be chained together. It's the descendant of an earlier iteration (plyr) and Hadley learned from that and just kept polishing the interface.

There's a similar story with tidyr. Reshaping data from wide to long is a complex operation. Base R has a reshape() function and using it has given me permanent PTSD. It was impossible to get right until you read the documentation. After you read the docs - still impossible. Hadley wrote the reshape2 package, which improved things a bit. Then there was tidyr::gather() and spread(). Finally, we got tidyr::pivot_wider() and pivot_longer(), and at last I can be reasonably confident of getting the results I need without too many tries.

The tidyverse is hugely popular for just this reason. Without it, I'd probably have abandoned R. I certainly wouldn't dream of teaching it. Ditto Rstudio. Calling these people parasites is absurd.

macleginn · 5 years ago
I agree with your assessment of the claims made in this post, but I want to note that for me the tidyverse culture became the reason to abandon R for all use cases where it has decent alternatives. R is not pretty, but it is also not super hard, and it has some internal logic. tidyverse's logic is orthogonal to that of R, it is a different data-manipulation paradigm, which I know have to know in addition to having to know R itself---not that you can really skip this part. Also, as it has been already mentioned here, this layering doesn't help with debugging.

ggplot2 did a similar thing with plotting. Together these two projects made R more accessible-like and, IMO, seriously damaged the culture around it.

asdff · 5 years ago
tidyverse has ruined stackoverflow. So many questions could be answered with base R, and sometimes you might need to do base R if you want your code widely compatible, but people insist on submitting some arcane ggplot2 or dpylr code instead, and nothing is learned about R.
mbauman · 5 years ago
It's interesting, though, that those "clean" interfaces come with the horrors of non-standard evaluation and have caused a rather large divide between the tidyverse and base R.

I fully disagree with the disparaging aspersions or even motives that TFA leans towards, though. I can see how this would arise out of a local minimum.

The "beauty" of R has always been its core DataFrame abstraction and the fact that a table is a language primitive — and that's where 90% of the consistency came from (from my outsiders' vantage point).

dash2 · 5 years ago
If you think data.frame is beautiful or consistent, try and guess what these do:

    d <- data.frame(a = 1:3, b = 1:3)
    d[4,] <- 1:3
    # or
    d$c <- NULL
    # or
    d[c(T,F,T,F), 1] <- 1
Error handling? Who needs it?

rossdavidh · 5 years ago
Just to add an anecdatum, as a python programming who has started using R in production, I find dplyr to be less clear than base R. Not that either one is clear.
harikb · 5 years ago
I think the author ignores the fact that creating simple, consistent systems is a very hard problem, and rarely compensated (short term). So most apps, APIs, platforms will eventually become complicated and messy until there is significant push from a competition to force people to do the dirty work of cleaning up architecture

So if you want a clean and simple R ecosystem, go support Julia or some alternative like that.

rossdavidh · 5 years ago
"Never ascribe to conspiracy, that which is adequately explained by incompetence."

Not that complexity means the programmer was "incompetent", per se, but not sufficiently competent to keep things simple, because as you point out, keeping things simple is really hard.

pdfernhout · 5 years ago
"Simple Made Easy" by Rich Hickey: https://www.infoq.com/presentations/Simple-Made-Easy/ "We should aim for simplicity because simplicity is a prerequisite for reliability. Simple is often erroneously mistaken for easy. "Easy" means "to be at hand", "to be approachable". "Simple" is the opposite of "complex" which means "being intertwined", "being tied together". Simple != easy. ..."
gwern · 5 years ago
> Once a collection of complicated packages exist, it is in RStudio’s interest to get as many other packages using them, as quickly as possible. Infect the host quickly, before anybody notices; all the while telling people how much the company is investing in the community that it cares about (making lots of money from).

More than RStudio, it's in Tidyverse users' interests to extend and embrace; even without any plan or malice aforethought, it is simply natural that as Tidyverse users do their own thing, their packages and scripts will rely on the Tidyverse ever more, and their downstream users will thenceforth rely on it whether those users like it or not. Thus the situation where you install.package some harmless looking package and now R is installing 20 or 40 packages from the Tidyverse.

I dub this general phenomenon 'bitcreep': https://www.gwern.net/Holy-wars Somewhat ironically, Hadley Wickham himself has been a bit annoyed by bitcreep caused by Mac/Linux users working against Windows users: https://twitter.com/hadleywickham/status/1280340931657564160

cwyers · 5 years ago
I missed your article when it first came out, and I'm glad to have encountered it here; thank you, I think it helps explain some things here and more generally. I do think that this explains why the "Base R" people feel so threatened by the Tidyverse.

What I find interesting about the quoted excerpt, and the article most broadly, is about how it spends very little time discussing what RStudio's economic interest in propagating the Tidyverse actually is. Because, notably, RStudio doesn't charge for the Tidyverse packages. There's no Tidyverse Enterprise Edition, there's no Tidyverse support plans. They mostly make money off their IDE and server products, and those products don't have a lot of meaningful synergies with the Tidyverse packages; there's few IDE features that integrate directly with the Tidyverse, and those that exist aren't very significant. Meanwhile, nothing about the Tidyverse is written to work better with RStudio than with VS Code using the R language server, or with Emacs, or what have you.

This isn't to say that RStudio is doing this for any reason other than economic interest, but the economic interest here is not in convincing R users to use Tidyverse packages; RStudio makes the same amount of money if you do your plots in ggplot or not, and if you switch to ggplot there's nothing about it pushing you to change your IDE/editor.

So what is the business model for the Tidyverse, then? It's pretty straightforward: the goal of the Tidyverse from the perspective of RStudio is to drive R adoption, mostly at places who are using commercial closed-source alternatives to R, like Matlab, or SAS or SPSS. (You could argue that they're competing for mindshare with Python, too, but RStudio has moved to treating Python the way that post-Azure Microsoft treats Linux, as a part of its product portfolio. RStudio's flagship products, the IDE and the Connect server, both advertise first-class Python support. Whether or not they have achieved this is another question, but they certainly are trying.) Once you have converted people to using R, you can sell them IDEs and servers.

I suspect that the "Base R" people have a hard time grappling with this aspect of the Tidyverse business model, because it implies that Base R is _inherently less popular_ than the Tidyverse, which makes their losing less about the whims of a corporation that they can argue is acting in bad faith, but about the preferences of the R community, which they have no power to gatekeep.

Because the point of the Tidyverse is driving adoption among people who were not previously R users, the Tidyverse is able to win over the majority of the community not by persuading holdouts but simply by growing the community with new Tidyverse supporters. I understand how that can feel threatening to someone who was an R user pre-Tidyverse, but you understand why they don't want to focus their message around shrinking the R community. And their efforts to persuade these new people to switch from the Tidyverse to Base R is undermined by the fact that the best argument for Base R over Tidyverse, the familiarity of Base R to someone who learned Base R to begin with, doesn't apply to them at all. They're fighting a losing war with bad weapons.

the_optimist · 5 years ago
A very unusual claim we have here, on the link-through to Matloff: “The Tidyverse also borrows from other "purist" computer science (CS) philosophies, notably functional programming (FP). The latter is abstract and theoretical, difficult even for CS students, and thus it is clear Tidy is an unwise approach for nonprogrammer students of R.”

This seems to be the root of the argument, and it is a completely bizarre statement.

vharuck · 5 years ago
I agree with Matloff's overall point, though. "Tidy programming" (which came to mean non-standard evaluation) is very hard to understand, even for R professionals. It relies on directly handling symbols, and encourages using new notations to do it. Debugging is even more complex with NST code, and people learning the language will be doing a lot of debugging. I can't imagine a good way to introduce functions to newer users when using NST. You'd have to first mention environments, scoping, and symbols.

My rant aside, I did find this next quote from Matloff a better argument for using dplyr:

    > mean(Nile[80:100])
"printing the mean Nile River flow during a certain range of years. Incredibly, not only would this NOT be in a first lesson with Tidy, the students in a Tidy course may actually never learn how to do this. Typical Tidiers don't consider vectors very important for learners, let alone vector subscipts."

With dplyr, you'd subset based on a `filter()`, likely specifying the years to keep. It encourages self-explanatory code. Matloff's vector subscript tells me nothing about why certain elements are kept.