SQL is Agile - Readit News

btilly · 13 years ago

At a past job we had a really fun interview question. What we did was laid out a concrete problem, and asked the interviewee to lay out a schema. Since we know the problem well, for virtually any schema we'd be able to invent reasonable requirements that would force them to change their schema. At the end we'd verify a basic knowledge of SQL itself by asking them to answer a couple of basic questions.

(The actual problem we had them tackle was reports to use in making picks in a fantasy sports game. But it could be anything simple to explain, fairly straightforward to solve.)

The reason why the question worked so well is that this was, in fact, pretty much how we build CRUD applications. We'd decide the schema, and once the schema was known, the CRUD application was fairly obvious and straightforward. And a large part of the programming model would be immediately provided by your ORM once you had done so.

thwarted · 13 years ago

We'd decide the schema, and once the schema was known, the CRUD application was fairly obvious and straightforward.

"Show me your flowcharts [code], and conceal your tables [schema], and I shall continue to be mystified; show me your tables [schema] and I won't usually need your flowcharts [code]: they'll be obvious." ~ Fred Brooks

RowanH · 13 years ago

A long standing interview question I've used is "draw us an ERD for this domain" (orders/customer/products).

It's what at first is a seemingly simple question to start drilling into understanding of systems "so if I was going to do X, what would you change in your ERD?" "how about if scenario Z was going to happen, would you change anything?"

Strange how every now and then you get a "system architect" that can't answer that one.

meaty · 13 years ago

That's usually because the architects don't give a shit about the storage model and recognise that the external interface/contract is far more important and that can be mapped to whatever schema supports the model. That's their domain. All they see is: "how can we ensure that this whole thing still works when someone does scenario X".

The schema is almost 100% irrelevant to us and we have nearly 2000 tables in our system. We have two hibernate implementations which put that all behind a testable interface: something SQL can't give you on it's own.

jzwinck · 13 years ago

Since your coworker liked that problem domain I'm sure you thought it was fun. But what about candidates? Ever have one like the poster in this same thread who doesn't like sports? What about female candidates? Would you agree there are likely to be gender and ethnicity biases in an interview question that discusses American sports?

This of course assumes the actual job is not about sports.

yummyfajitas · 13 years ago

Programming is about learning enough about a problem domain to model it with code. My last job involved women's fashion, which I (and my unfashionable male coworkers) don't give a crap about. My current job involves delivering local news to middle aged female suburbanites, which I (and most of my brooklyn hipster colleagues) are similarly uninterested in.

My lack of interest in school closings in long island doesn't prevent me from making postgres reject articles without a title.

If a developer can't ask questions and figure out enough about football to model the data reasonably well, he/she is a solid NO HIRE.

(I'm also curious what you mean by "gender and ethnicity biases". Are hispanics/women incapable of understanding sports or something?)

hugh4life · 13 years ago

"What about female candidates? Would you agree there are likely to be gender and ethnicity biases in an interview question that discusses American sports?"

Don't you agree there are gender and ethnicity biases in posing these questions?

btilly · 13 years ago

For the record I don't like sports.

In the time we gave that question, being a fan of sports did not generally help on that question. The necessary domain information is small, the relational design skills tested are not.

In fact we had one candidate who said up front that he didn't know anything about sports. He wound up doing well on that question and being hired.

Q6T46nT668w6i3m · 13 years ago

I like it. Would you mind sharing a “concrete problem” and subsequent “reasonable requirements?"

btilly · 13 years ago

Concrete problem, reporting system to help our boss make fantasy baseball (or other random sport) choices. Your responsibility will be the schema and reports, I'll be responsible for getting data into your schema.

Reasonable requirements that often require changes to an initial schema:

- What happens if I'm trying to update data but I only get partial data, how am I supposed to fix that? (A surprising fraction of people denormalize data too early.)

- Can a player play two different positions?

- Can a player transfer teams from one season to another?

- Can a player transfer teams in a season.

- Can a player change names? (If we threw this one at you, then you probably had a very good schema!)

And then reasonable questions to test their SQL would be things like:

- Who played in game X?

- How many home runs did person Foo have?

- How many home runs did team Bar have?

- Sort all players by their hitting average.

mattmanser · 13 years ago

There's no difference between this and designing the classes first.

Designing the classes first in an explicitly typed language actually works a lot better than designing a schema in a DB as you're not constrained by relational requirements and it's easier to change imo. Make a change to your classes and you can usually just right-click refactor, try and make a change to your SQL schema and it won't accept it at all until you unhook all the dependant tables.

Basically, I don't really get what point you're trying to make.

You never know your domain well enough to make a decent schema unless you've already written the program once before. Which is a pointless thought exercise where you get to point at flaws in someone else's nascent design. Like it sounds like you do in an interview, which is great for seeing their thought processes, but don't fool yourself that it's because of that design method.

In other words, the reason it worked so well is because you had already built it once, nothing at all to do with your choice of schema first.

In the end do what works for you, but schema first vs classes first is a trivial and pointless argument as both work fine.

Xurinos · 13 years ago

Having done this for a while, I can solidly say that the relationships and structure of data has been consistently better when designed as a set of schema tables than as a set of classes because they are especially constrained. The constraints forced people to actually think about the relationships between facets of their data, and the resulting classes were much cleaner and more maintainable. Should we ever decide to move aspects of the classes into a database, the transition requires little code refactoring; this comes up fairly often for us, since we are trying to give users more customization powers.

People should be doing that when they design classes, but the flexibility tends to work against them. It could just be a mindset that makes it work well, and this might be a more appropriate approach for larger projects than for smaller types.

btilly · 13 years ago

If you think that you're not constrained by relational requirements, when your data is backed by a relational database, then you're living in a fantasy. The constraints exist, the only question is whether you recognize that fact.

When you're answering an interview question on a whiteboard, "right-click refactor" usually fails hard.

The whole point of the exercise is to test someone's design skills, and to find out what interactively designing with them would be like. We didn't care that they came up with a good design - we wanted to see the thought process and the interaction. Because in that organization, on that team, that was actually how we did design - laying out the database schema on the white board. Therefore we were testing design sensibility and a concrete skill that people needed to have.

codewright · 13 years ago

Not all of us use a language that has classes or where it's considered good taste to use classes to structure everything.

jeffdavis · 13 years ago

"There's no difference between this and designing the classes first."

Databases are designed to be shared structures accessible from many applications, so it's not a 100% direct comparison. When focusing on a single process of a single application, of course that leads to some simplifications (no IPC, all datatypes match up perfectly, no concurrency problems).

"you're not constrained by relational requirements"

I don't see relational as more constrained or less constrained than OO. It's different. For instance, in OOP I often feel like it's forcing a hierarchical structure upon the design. Inheritance feels very constraining to me in comparison to the free joining of one table to another based on the values inside (not necessarily based on explicit connections in the design).

lucian303 · 13 years ago

It's always about the data, never about the code.

outside1234 · 13 years ago

There are no schema-less databases. There are databases where the schema is applied at write and there are applications that apply a schema at read.

PommeDeTerre · 13 years ago

Very true. I always chuckle when some NoSQLer tells me how much time he's saved by not having to define and update a schema. Then he tells me how all he has to do is check whether fields exist or not in his code, and enforce constraints, and provide default values, and so forth. He has to do this in the Ruby code that he's writing, and his colleague has to do the same in the PHP code she's writing, and then there's the analysis team who has to write the very same code in their Python scripts...

wpietri · 13 years ago

I'm not a big fan of either of those approaches.

The cross-implementation schema constraints in SQL DBs are pretty weak. I can only express a small fraction of my application logic there unless I want some serious vendor lock-in.

And then regardless, once I'm referencing the same schema from multiple code bases, schema improvements are basically impossible.

If I need to get at the same data from multiple code bases, I think an API is a lot cleaner. And then I'd rather put the constraints in my API code. The tools are so much better, and programming languages are much more flexible.

meaty · 13 years ago

Actually I think the motivation is that the schema exists in code rather than in the storage engine. That gives the amazing productivity boost of not having to load the schema into the database engine - something you appreciate when you hit a big 2000 table / 45000 stored proc sized mess. It's another step towards statelessness.

Basically schema in code scales development-wise better than schema in storage engine.

raverbashing · 13 years ago

If you're having 3 different codes reading the same DB you're doing something wrong

"check whether fields exist or not in his code, and enforce constraints, and provide default values,"

Well, you need to do that regardless of the DB type. What can be a non existing value in Mongo may be a NULL in SQL, or a many-to-many relationship that's currently empty. So you will get the data in a format or another and your code still has to work it out.

And I chuckle when I see SQLer saying 'how easy it is to add a column to Postgresql on runtime' that's like saying a car is faster than a plane because you can't go to the grocery store by plane.

FuzzyDunlop · 13 years ago

The thing I noticed about NoSQL is that, at least with Mongo, you just end up putting your schema in your app. You probably do it by saying a particular class has a bunch of fields or instance variables that you expect to be a certain type.

I wrote a portion of an app that relies on Redis, and despite the fact that it's really nice to work with, especially in Ruby (that practically mimics Redis' native API), it was incredibly difficult to remember the structure. What keys do I need? What type are they? Oh I'll have to do `KEYS *` and look I guess. Then I'll remember to document it (and thus create the schema).

Of course, the problem on my part was, in part, that I probably didn't need to use Redis. And when I needed to use Mongo, well, I wasn't storing things that look like documents. I think this is an easy mistake to make, but one you learn from quite easily once you've experienced it yourself.

Skywing · 13 years ago

I recently switched from Mongo over to CouchDB. The reason that I favor Couch over Mongo is primarily because I felt like Mongo's querying capabilities suck when dealing with embedded documents and that Mongo's MapReduce is not their first-class focus. Second, though, was because with CouchDB, I'm forced to basically put all of my query logic on the database server, which keeps it out of my client code. This cleans up my client code a ton, and querying from views gives context to what the data that you will be getting in return is. I don't think I'll ever look back to Mongo as long as I can choose CouchDB.

sprobertson · 13 years ago

> at least with Mongo, you just end up putting your schema in your app

I do this all the time, but I think it's a good thing. When rapidly prototyping it's nice to be able to just write up a few classes to define the necessary data types -- which is exactly how most SQL ORMs work, but Mongo's big advantage is that those classes can be totally rearranged without having to do migrations or go in and mess with underlying table structures. When SQL is really necessary, those classes are usually easy to translate to use something like SQLAlchemy.

marcosdumay · 13 years ago

> The thing I noticed about NoSQL is that, at least with Mongo, you just end up putting your schema in your app.

And, in my point of view, that is the main advange of NoSQL. You can distribute all the schema related functionality while keeping the data centralized in a server. Of course, you'll only want to do that when the performance requirements are hight enough that the centralized schema becomes a botleneck...

What of course isn't a problem when you are prototyping and have no user yet. Or, in other words, the article is right on the mark.

d--b · 13 years ago

This post is not addressing the right question. Think of databases as storage space, like wardrobes. The sql database will allow to store your clothes in spare parts each with its own dedicated space, allowing you to quickly calculate the sum of all stripes on all your socks. That's great but on the other hand, if you want an outfit you have to put it together, starting by sewing the sleeves on the shirt. On the other the NoSQL wardrobe will have various boxes where you can put any cloth you want. Now it's up to you to tidy the whole thing. If you want you can put full outfits in the box so that you don't have to think much about what to wear. However to count the number of stripes on your socks, you'll have to go through the whole wardrobe. The point is: It's not the fault of the wardrobe if your room is a mess

lnanek2 · 13 years ago

Mmm, there's so much less code when everything is in JSON though. You can often ditch half the classes in the system. Sometimes I just don't need a client side class, a server side class, a schema, etc. just to store a few fields. It violates Do Not Repeat Yourself, especially when client and server are running different technologies like NodeJS server vs. Objective-C client. It is so much simpler to have just JSON in node and a dict on the client than to have a set class JS file with schema in node and a class on the client, and have to change them all whenever the data format changes. Also violates You Ain't Gonna Need It, from the agile rule house. A lot of times you build all that boilerplate, maintain all that bolierplate, and you never end up needing it anyway. Or have to change it all before you would have needed it.

matthavener · 13 years ago

What's the difference between a server side class and a schema? And isn't the necessity of creating a class just a complexity required by some statically typed OO languages. Some data defined by a schema is still just data -- it can be used similarly to JSON.

mistercow · 13 years ago

It seems to me that of all the advantages of NoSQL, schemalessness is probably the most minor and dubious. Most of the pain of dealing with SQL comes from the relational part, not from needing a schema.

The problem with SQL is that it's clever. To do things efficiently and reliably, you often must do very clever combinatorial operations on different tables, producing elaborate structures which are then winnowed down to a simple list of information. So right off the bat, you encourage programming logic that is difficult for future maintainers to wrap their heads around. Of course, most programmers aren't clever, so their queries end up being both difficult to understand and subtly wrong.

Then, of course, your data comes out as a list of rows. Well that's just great. If your needs involve a more complex data structure than that, well, you're going to need code to rebuild that outside the database.

loxs · 13 years ago

Yes, and not even only this. The really big difference is the eventual consistency, used as a core concept in most horizontally scalable data stores. If you once design your application with heavy reliance on hard consistency, you will have a huge problem scaling it later on.

jeltz · 13 years ago

Very few applications need to scale past that point. I know plenty of business where if they would scale past the point of being able to run on a single SQL database instance they would make billions and be larger than their theoretical target market. Not everyone is building the next Facebook (which coincidentally uses MySQL for a lot of things).

ucee054 · 13 years ago

As opposed to if you don't, in which case it will be wrong straight away

ucee054 · 13 years ago

Instead of turning your infrastructure to dumb crap, you instead have the option of firing your not clever programmers.

I mean, they need to be able to deal with database logic just to graduate from a good CS bachelor's program, so that's a pretty low bar.

marcosdumay · 13 years ago

There aren't enough competent programers to automatize the entire world. Of course, if I was hiring for an IT company, I'd look for the best... But most of the world will have to content themselves with poor programers.

Anyway, poor programers work their best (less worst) in strong typed environments. That includes a very restricted schema on the database. It's much better having them write queries that nobody could read than having them insert data that nobody could read.

mistercow · 13 years ago

There's a difference between "clever" and "smart". Clever is rarely a good thing in programming. Code should be comprehensible, and clever code rarely is. The more proud you feel after writing a line of code, the more likely it is that you should rewrite it. The problem isn't that programmers aren't clever enough to write SQL queries. The problem is that programmers aren't smart enough to later read those SQL queries.

>I mean, they need to be able to deal with database logic just to graduate from a good CS bachelor's program, so that's a pretty low bar.

People manage to get through college without absorbing the basic requirements all the time, and being "able to deal" with database logic is not the same as being proficient with it. The problem with a system that demands cleverness is that maintaining clever code requires a high level of proficiency.

freework · 13 years ago

Mongo is like the PHP of the database world. Many important features are dropped in the name of "getting started quickly".

What you gain in short term wins, are nullified by all the long term headaches you suffer.

tracker1 · 13 years ago

I don't think I would quite agree with that.. I would say that mongo is very accessible in terms of getting started... also, you can index nested properties via dot notation, unless you are really deeply denormalized, collections of objects with collections, I haven't been aware of any issues there.

A few things that surprised me about mongo... first that if you do a sort, you either need an index on your sorting fields or your sort info needs to fit in 32mb. I'm using skip/limit for paging, and it really became an issue once I was running against the full datastore. Second, only one single index is used in a query, unike most sql databases that will collate results of mutiple indexes under the covers... this caused me to rethink my index strategy, as well as add both asc/desc indexes for sorting results.

For me it was a matter of getting away from an over normalized data set (30+ queries for a single vehicle classified ad display, 23-joins if you wanted to reduce 15 of those queries (my main VIEW created for migration)... denormalized data makes more sense for this use case. For others it may or may not make sense.

aidos · 13 years ago

Relational systems use multiple indexes for queries, though I'm not aware of an implementation that uses multiple indexes for a single table.

Remembering that a query in Mongo is basically a query on a single table it's acting in the same way a relational db does - ie, check statistics, choose best index, perform lookups.

styluss · 13 years ago

Could you give examples? I'm working at a company where Mongo is being used because.

jaequery · 13 years ago

once you get into creating administrative/reports sections i think you'll quickly find out.

camus · 13 years ago

what does PHP "drops" in the name of getting started quickly ? PHP is now a robust language with robusts frameworks like SF or ZF.

jeltz · 13 years ago

Two examples: including loads of functionality into the default namespace instead of developing it as external libraries, and confusing the template language and the programming language which encourages XSS vulnerabilities in PHP written by beginners. When using PHP you normally want to use a different template language to avoid having to audit the code and hunt unescaped output.

vyrotek · 13 years ago

This matches my experience from the last few years as well. I was burned by denormalizing too early and not fully understanding the trade-offs I was committing to for the sake of not dealing with 'schemas' (which I actually have no problems with). And just like the OP, I have found far better success sticking with SQL early on for many of the same reasons mentioned. Later, as products and data schemas matured I found very specific uses for data stores like MongoDB. But, they are never the core data store.