For all the benefits, there is a large problem with this approach that often goes unacknowledged. It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.
The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.
It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"
Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.
I tried, for some time, to develop a product designed to solve this. It would have made it easier to specialize models locally while complying with the corporate one. (Basically, beefing up the data definition language to something like prolog, and putting real thought into making the corporate model reality-based rather than just what suits your current requirements.)
Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.
But am I bitter about it? No, why do you ask? Twitch, twitch.
UDA embraces the duplication of models: it's a fact of life in the enterprise. That is why "domains" are first-class citizen. We believe that good discovery capabilities will increase reusability of the domain models. Our next article will dive more into the extensibility capabilities of the metamodel Upper.
> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.
Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.
> But think about the red tape that introduces.
We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.
But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.
IME it often comes down to "big men" issues where someone important wants the data in a certain way that is not logical or consistent so they won't let the "tech people" simple take the data and present it in a way that is logically consistent and follows best practices. They want to sit in meetings and create their own mental model monstrosity and force the devs to make it. Once that happens one time there is zero chance of the company ever having a consistent data model at any point in the future ever.
Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.
Reminds me of my experience trying to understand what SAP actually is. For decades I wondered what sort of magic tech must be in there that allowed their software to be used by thousands of different businesses. Then someone who knew about SAP told me: "oh, no that's not how it works -- what they do is have a fixed schema and tell the customer that they must adopt it".
> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.
It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.
Data drift is real! I’ve recently restored sanity in a medium sized enterprise where there were three concurrent financial data flows. Including people not understanding each other, projects to find out ground truth and triple the workload in maintaining the dataflows. I’ve quipped to the team that endless summer is near. What if we only work on business relevant development. I would dream that the bigcorp we are part of would do the same. They are more of a tack on another Excel based solution kind of firm.
Data drift is real, and the yoke of governance chafes enough that new people insist on redoing your work in excel until the problem gets bad enough that a new data governance push is needed.
Spitballing. Another way to deal with the problem is like what would you do if you had billions of pieces of unstructured data (except for maybe the data being somewhat XML like) and you don't control any of it but you need to make sense of it as (ignoring rounding errors) your only business concern. That company is Google of course.
Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!
The alternative is the same barriers, except with a parallel phone a friend governance model when you have to share data between verticals or programs.
It’s a classic pattern in public sector applications, where it’s partially deliberate.
This doesn't sound significantly different than any other large tech org.
If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.
You could store the info as a common definition and then just use transformations on retrieval or storing if there's an exception for that system/business group.
At a place like Netflix where the product has been fundamentally the same for almost a decade, installing this kind of red tape is great for job security
> installing this kind of red tape is great for job security
It really doesn't, and that's not the point. This is for business entities that are larger than teams.
It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...
If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...
This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.
sometimes grug go too early and get abstractions wrong, so grug bias towards waiting
big brain developers often not like this at all and invent many abstractions start of project
grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"
but grug learn control passions, major difference between grug and animal
instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow
working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly
remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen
It's been so long since the Semantic web and RDF and OWL and SKOS. I'm so glad they stuck with W3C and didn't reinvent those wheels. Will this UDA approach catch on? I don't know, but I hope so. It seems like it is trying to move the frontier of the difficulties of applying Domain Driven Design and semantic concepts to an enterprise company of significant scale.
If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.
For that, I'm grateful Netflix is working on this and publicizing the interesting work.
Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.
You can find more information on this fascinating work here:
I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.
A bit unfortunate they used the term domain model here. Domain models here are purely data-centric, whereas domain modeling focuses mainly on behavior, not underlying data structures. The data that is used in domain models is used to facilitate the behavior, but the behavior it the code focus.
From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.
How does this relate to domain-driven design? It seems to be at odds with it, because in DDD it's kind of expected that the same concept will be represented in a different way by each system? But to be honest, I didn't read the whole blog post because of the UML vibes.
It doesn't. It's a blessing that they avoided the term "ubiquitous language" because that's almost exactly the dual of this concept, although people who have only ever heard the words and not dug any deeper won't know what the difference is.
Seems to be enforcing ‘ubiquitous language’ at the machine level - not some kind of mathematical dual where one is invertible to the other - but enforcing soft skills as hard skills.
‘protobuf specs dont have enough information for us to codegen iceberg tables so we will write a new codegen spec language’
what makes a duck a duck?
when we know which tables we can find it in
I'm curious if anyone has seen business improvements along the lines of "this let us discover something that led to 5%+ or >$5M improvements" (percent or absolute depending on how big the company is) from these kinds of efforts?
I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)
I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?
And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.
> "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on"
This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.
But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?
Why would Netflix engineering host on Medium? Very odd. And you just lose readers to the popups but you don't benefit from their discovery much either.
I wonder how they deal with versioning or breaking changes to the model. One advantage of keeping things more segregated is that when you decide to change a model you can do it in much smaller pieces.
I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.
> I wonder how they deal with versioning or breaking changes to the model.
Versioning is permission to break things.
Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.
That is a lot of subgraphs. Am I understanding correctly then that under UDA developers fulfill the UDA spec in whatever language they’re using, and then there’s some kind of middleware that will handle serving GraphQL queries? How are mutations represented? And how are other GraphQL-specific idioms expressed (like input parameters, nodes/edges/connections/etc.)? Is it just a subset of GraphQL that is supported?
I manage a much smaller federation where I work, and we have a lot of the same ideals I think in terms of having some centralized types that the rest of the business recognizes across the board. Right now we accomplish that within a set of “core” subgraphs that define these types, while our more product-focused ones implement their own sets of types, queries and mutations and can extend the core ones as it makes sense to.
The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.
It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"
Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.
Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.
But am I bitter about it? No, why do you ask? Twitch, twitch.
Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.
> But think about the red tape that introduces.
We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.
But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.
(I'm one of UDA's architects.)
Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.
It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.
Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!
It’s a classic pattern in public sector applications, where it’s partially deliberate.
If your data/service/api is used by a lot of other people in the org, you have to work with them to make sure your change doesn't break them. That's true regardless of the architecture.
It really doesn't, and that's not the point. This is for business entities that are larger than teams.
It's way worse to have a million different schemas with no way to share information. And then you have people everywhere banging on your door asking for your representation, you have to help them, you have to update it in their systems. God forbid you've got to migrate things...
If your entity type happens to be one that is core to the business, it's almost a neverending struggle. And when you find different teams took your definition and twisted it, when you're supposed to be the source of truth, and teams downstream of them consume it in the bastardized way...
This project sounds like a dream. I hope it goes well for Netflix and that they can evangelize it more.
big brain developers often not like this at all and invent many abstractions start of project
grug tempted to reach for club and yell "big brain no maintain code! big brain move on next architecture committee leave code for grug deal with!"
but grug learn control passions, major difference between grug and animal
instead grug try to limit damage of big brain developer early in project by giving them thing like UML diagram (not hurt code, probably throw away anyway) or by demanding working demo tomorrow
working demo especially good trick: force big brain make something to actually work to talk about and code to look at that do thing, will help big brain see reality on ground more quickly
remember! big brain have big brain! need only be harness for good and not in service of spirit complexity demon on accident, many times seen
https://grugbrain.dev/#grug-on-complexity
If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.
For that, I'm grateful Netflix is working on this and publicizing the interesting work.
https://www.uber.com/blog/dragon-schema-integration-at-uber-...
Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.
You can find more information on this fascinating work here:
https://github.com/CategoricalData/hydra
I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.
I wonder if LLMs can help with that part today.
From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.
The "Domain" in `upper:DomainModel` is the same D as in DDD (Domain-Driven Design) as the D in DGS (Domain Graph Service).
> in DDD it's kind of expected that the same concept will be represented in a different way by each system
In UDA, those concepts would explicitly co-exist in different domains. "Being the same" becomes a subjective thing.
I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)
I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?
And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.
This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.
But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?
https://scribe.rip/uda-unified-data-architecture-6a6aee261d8...
and SEO
Part of marketing is knowing your audience. And plenty of marketing people exist with deep tech experience.
Sure you do.
And the types of engineers writing on Medium are the ones they want to recruit, too.
I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.
Versioning is permission to break things.
Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.
I manage a much smaller federation where I work, and we have a lot of the same ideals I think in terms of having some centralized types that the rest of the business recognizes across the board. Right now we accomplish that within a set of “core” subgraphs that define these types, while our more product-focused ones implement their own sets of types, queries and mutations and can extend the core ones as it makes sense to.