Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

For all the benefits, there is a large problem with this approach that often goes unacknowledged. It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.

It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"

Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.

jfengel · 2 months ago

I tried, for some time, to develop a product designed to solve this. It would have made it easier to specialize models locally while complying with the corporate one. (Basically, beefing up the data definition language to something like prolog, and putting real thought into making the corporate model reality-based rather than just what suits your current requirements.)

Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.

But am I bitter about it? No, why do you ask? Twitch, twitch.

bertails · 2 months ago

UDA embraces the duplication of models: it's a fact of life in the enterprise. That is why "domains" are first-class citizen. We believe that good discovery capabilities will increase reusability of the domain models. Our next article will dive more into the extensibility capabilities of the metamodel Upper.

bertails · 2 months ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.

> But think about the red tape that introduces.

We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.

But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.

(I'm one of UDA's architects.)

datadrivenangel · 2 months ago

How can it be universal if everything isn't in UDA?

citizenpaul · 2 months ago

IME it often comes down to "big men" issues where someone important wants the data in a certain way that is not logical or consistent so they won't let the "tech people" simple take the data and present it in a way that is logically consistent and follows best practices. They want to sit in meetings and create their own mental model monstrosity and force the devs to make it. Once that happens one time there is zero chance of the company ever having a consistent data model at any point in the future ever.

Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.

dboreham · 2 months ago

Reminds me of my experience trying to understand what SAP actually is. For decades I wondered what sort of magic tech must be in there that allowed their software to be used by thousands of different businesses. Then someone who knew about SAP told me: "oh, no that's not how it works -- what they do is have a fixed schema and tell the customer that they must adopt it".

UltraSane · 2 months ago

Epic EMR is the same. But then some hospitals insist on customizing it which causes no end of problems.

cush · 2 months ago

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.

wjnc · 2 months ago

Data drift is real! I’ve recently restored sanity in a medium sized enterprise where there were three concurrent financial data flows. Including people not understanding each other, projects to find out ground truth and triple the workload in maintaining the dataflows. I’ve quipped to the team that endless summer is near. What if we only work on business relevant development. I would dream that the bigcorp we are part of would do the same. They are more of a tack on another Excel based solution kind of firm.

datadrivenangel · 2 months ago

Data drift is real, and the yoke of governance chafes enough that new people insist on redoing your work in excel until the problem gets bad enough that a new data governance push is needed.

tomrod · 2 months ago

Corolloray to Hyrum's Law then. Perhaps we call it "Orange is the New Model" Law

mkoubaa · 2 months ago

Love it

bravesoul2 · 2 months ago

Spitballing. Another way to deal with the problem is like what would you do if you had billions of pieces of unstructured data (except for maybe the data being somewhat XML like) and you don't control any of it but you need to make sense of it as (ignoring rounding errors) your only business concern. That company is Google of course.

Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!

Spooky23 · 2 months ago

The alternative is the same barriers, except with a parallel phone a friend governance model when you have to share data between verticals or programs.

It’s a classic pattern in public sector applications, where it’s partially deliberate.

rco8786 · 2 months ago

This doesn't sound significantly different than any other large tech org.

If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.

giantg2 · 2 months ago

You could store the info as a common definition and then just use transformations on retrieval or storing if there's an exception for that system/business group.

stathibus · 2 months ago

At a place like Netflix where the product has been fundamentally the same for almost a decade, installing this kind of red tape is great for job security

echelon · 2 months ago

> installing this kind of red tape is great for job security

It really doesn't, and that's not the point. This is for business entities that are larger than teams.

It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...

If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...

This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.

thefourthchime · 2 months ago

sometimes grug go too early and get abstractions wrong, so grug bias towards waiting

big brain developers often not like this at all and invent many abstractions start of project

grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"

but grug learn control passions, major difference between grug and animal

instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow

working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly

remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen

https://grugbrain.dev/#grug-on-complexity

I'm curious if anyone has seen business improvements along the lines of "this let us discover something that led to 5%+ or >$5M improvements" (percent or absolute depending on how big the company is) from these kinds of efforts?

I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)

I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?

And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.

RobinL · 2 months ago

> "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on"

This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.

But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?