Hi, I released Shale, a Ruby gem that allows you to parse JSON, YAML and XML and convert it into Ruby data structures, as well as serialize your Ruby data model to JSON, YAML or XML.
Features:
- convert JSON, XML or YAML into Ruby data model
- serialize data model to JSON, XML or YAML
- generate JSON and XML Schema from Ruby models
- compile JSON Schema into Ruby models (compiling XML Schema is a work in progress)
A quick example so you can get a feel of it:
require 'shale'
class Address < Shale::Mapper
attribute :street, Shale::Type::String
attribute :city, Shale::Type::String
end
class Person < Shale::Mapper
attribute :first_name, Shale::Type::String
attribute :last_name, Shale::Type::String
attribute :address, Address
end
# parse data and convert it into Ruby data model
person = Person.from_json(<<~JSON) # or .from_xml / .from_yaml
{
"first_name": "John",
"last_name": "Doe",
"address": {
"street": "Oxford Street",
"city": "London"
}
}
JSON
# It will give you
# =>
# #<Person:0xa0a4
# @address=#<Address:0xa0a6
# @city="London",
# @street="Oxford Street",
# @zip="E1 6AN">,
# @age=50,
# @first_name="John",
# @hobbies=["Singing", "Dancing"],
# @last_name="Doe",
# @married=false>
# serialize Ruby data model to JSON
Person.new(
first_name: 'John',
last_name: 'Doe',
address: Address.new(street: 'Oxford Street', city: 'London')
).to_json # or .to_xml / .to_yaml
Hey this is a very cool project! When you were developing it, I'm curious if you took any special security precautions in your design of this project, seeing how XML/JSON/YAML serialization and de-serialization are the topic of many high profile CVEs, particularly in the Ruby community?
Shale uses Ruby's standard library parsers out of the box, so if you keep your Ruby up to date with security updates you will be good. Also others in this thread suggested to set minimal version on dependencies, so I'll probably do that in the future version.
This seems like programmer error. Don't put restricted fields into types you're deserializing off the wire. It's like accepting user input and directly inserting it into a database without any validation.
If you don't define attributes explicitly on the model, Shale will ignore them.
Regarding attributes that you defined but still don't want to be assigned, you should probably filter them before passing them to Shale, or alternatively filter them with Shale before passing them further down the stack (e.g to ActiveRecord)
Documentation site was based on https://vuepress.vuejs.org/ but it evolved so much I dropped Vue all together and wen't with plain HTML instead. I must have left that meta tag from the early days.
Regarding Vue I use it daily at my job, great library :)
Serialization/deserialization is such an important part of web development, I have no idea why Rails includes the ancient JBuilder (and very slow since it goes through templating) library, instead of investing in a proper library. Let alone deserializing which is equally important..
I think the API Shale provides is pretty sane. I would probably use it in my next Ruby/Rails project. I don't like the fact that Nokogiri is included by default, it would be nice to declare a core type, and then bring in what you need (JSON, XML, YAML) as a different gem. But that's not a deal breaker for me.
I have created my own serializers in the past (SimpleAMS[1]) because I really detested AMS, no offence to AMS contributors, but AMS library should just die. Rails, and way more importantly Ruby, should come up with an "official" serializers/deserializers library that is flexible enough, rock solid and fast. For instance I had done some benchmarking among common serializer libraries [2] and AMS was crazy slow, without providing much flexibility, really (meaning, slowness is not justified). Others were faster, but were supporting only one JSON spec format (like jsonapi-rb). I am wondering where shale stands.
Another thing is that most serialization libraries seem to have ActiveSupport as a main dependency (not shale though) which I think is a bit too much, and actually has a performance hit on the methods it provides.
I really think that Ruby community can do better here ?
I'm glad you like it. One clarification - Nokogiri is not required by default, you have to explicitly require "shale/adapter/nokogiri" to use it. If you don't Shale will use REXML which comes from Ruby's standard library.
Rexml has been gemified. Shale's gemspec doesn't require a specific version of rexml and rexml<3.2.5 is vulnerable to CVE-2021-28965. I just checked Ubuntu 20.04 LTS and got Ruby 2.7 with rexml 3.2.3 by default so this seems like a realistic concern and it would be safer if shale required a minimum rexml version.
If I get a dependabot alarm for my Rails project, I would do well to make a bet that it's a nokogiri vulnerability. I haven't looked into the "why" or what's really going on, but it does feel like there's a lot of room to look at attack surface or any core design issues.
Nokogiri is one of the most security-sensitive parts of any Rails codebase, since it's used for parsing and sanitizing untrusted HTML and XML documents. Accordingly, there's a lot of scrutiny on it (and its upstream dependency, libxml2). That said, as far as I'm aware, almost all of the recent vulnerabilities I've noticed have been related to XSLT and other obscure XML features that most people probably don't use (and aren't enabled by default). So there's a combination of both 1) lots of scrutiny on the library itself leads to high security standards and 2) the goal of fully-featured XML processing adds a large attack surface that may not be relevant to most people that leads to a lot of vulnerability alerts.
Personally though, I've been seeing almost 10x the amount of alerts for useless "vulnerabilities" like ReDOS in nodejs projects though. Either way, alert fatigue is real.
XML is chock-full of misfeatures ripe for creating security vulnerabilities. It's not just nokogiri – XML parsing libs are one of the hottest sources of vulnerability notifications in many ecosystems (a large number of those CVE alerts come by way of using libxml2 under the hood, which nokogiri also depends on).
Safely parsing untrusted XML is an extremely hairy task.
This library looks great for those using it, but I wish the situation for "ActiveRecord model -> JSON representation" in open-source libraries was better. This library seems to be overkill for that, since you'll almost always want completely separate code for "deserializing" attribute updates from a request, and it requires you to specify the type of every single property. ActiveModel::Serializer was great while it lasted, but it's unmaintained and missing a lot of features. Blueprinter seems a lot less battle-tested and may have performance problems. Last I looked, almost no library easily supports eager-loading. Is this right? I feel like I must be missing something. How do people render their models in modern Rails apps?
Glad to see folks actively pushing things in the Ruby space further. I've said it before, but I recently returned to Ruby and Rails after many years away, and my productivity has reached levels I couldn't imagine. Subjective for sure, but ruby is a beautiful fun language, and rails has everything (especially now with https://hotwired.dev) that a single founder needs.
I'm currently doing both Rails and Go, it's just different worlds. I'm a Go noob so it's not a fair comparison but still - I did Django, Node, etc etc and Go is just miles behind anything productive.
You are comparing a language to two complete frameworks and a runtime.
Go can be extremely productive but it's definitely not a great choice if you need to create a web app over a weekend.
RoR, Django etc have ready solutions for things like authorization\authentication, administration tools, oauth... Not to mention that 'framework' assumes some sort of contracts so that all thing build for the framework in question can talk to each other.
Go is a good choice if you need to build a custom solution for your needs. Not if you are looking for a set of building blocks you have to configure for your task.
I did Go for around a year and found I'm not its target audience. I need to build database-backed web apps quickly and, while possible in Go, it wasn't easy. Rails is a dream in comparison for that purpose. I experimented with many of the Go web frameworks, but it felt very much like a square peg in a round hole.
I really liked Go for what it was, but it wasn't the right fit for my set of problems.
Exactly my experience. Rails allows me to build quickly and iterate even faster. Highly recommended. Hotwire is pretty cool, too. I‘ve built an action palette type of dialog with keyboard navigation without any stateful JavaScript (except for the cursor).
I haven't used Turbo yet, but Simulus is a great little framework for slapping a bit of JS onto an existing fairly vanilla server side app to add some nice interactive experiences.
Have really enjoyed using it recent in my Rails apps.
One of the things that keeps being repeated in ruby land is that domain objects are usually married to storage/serialisation method. At some point of application maturity you'll need some other method of serialisation, some other type casting or conversion logic for your form or something else, but by that time a lot of surrounding code would depend on implicit logic of the original base library.
ActiveRecord does this, and your library does it too. Object mappers which can initialize or serialize instances of other classes, including PORO, are much more versatile and future-proof. And API for doing that could look almost the same as yours.
Great point. I feel like this is an often ignored advantage of JS/TS projects. Most often data is passed around as POJOs. It's dead simple and easy to duplicate, serialize, and mutate
You don't have to sacrifice that simplicity, actually. (And I insist on that simplicity being a wrong type, it'll bite users of your library basically right away, when they try to use it for anything apart from storage/serialisation)
But you can just give an upgrade path!
consider something like this:
class Address
attr_accessor :street, :city
end
class Person
attr_accessor :address
end
class AddressMapper < Shale::Mapper
mapped_class Address
attribute :street, Shale::Type::String
attribute :city, Shale::Type::String
end
class PersonMapper < Shale::Mapper
mapped_class Person
attribute :address, AddressMapper
end
# use like this
PersonMapper.from_xml("...."); PersonMapper.to_xml(person)
and then, for _dead_ simplicity, you can add another method
generate_mapped_class "Person"
which will define that PORO class for user for extra DRYness. API is basically the same, no repetition, but amount of rewrite with new requirements is drastically less.
I'm not asking you to rewrite your library, and I probably won't write and release mine, just saying that considering future self isn't that hard. And yeah, it's a bit of a rant about ActiveRecord from user of Rails, since 2006.
Agreed that this is a big advantage. I’ve switched to having a separate set of serialization objects with straightforward copy constructors or mapping functions and let the serialization library do the job against those. I used to hand roll the serialization, but this is admittedly user.
I like this idea, I remember seeing something similar in Trailblazer. But basically you just define your models once, and then you can transform them into different formats, and have them play nicely with ActiveRecord as well. Pretty cool :)
Features:
- convert JSON, XML or YAML into Ruby data model
- serialize data model to JSON, XML or YAML
- generate JSON and XML Schema from Ruby models
- compile JSON Schema into Ruby models (compiling XML Schema is a work in progress)
A quick example so you can get a feel of it:
Source code is available on GitHub: https://github.com/kgiszczak/shaleCWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes <https://cwe.mitre.org/data/definitions/915.html> (Ruby on Rails Mass assignment bug)
Regarding attributes that you defined but still don't want to be assigned, you should probably filter them before passing them to Shale, or alternatively filter them with Shale before passing them further down the stack (e.g to ActiveRecord)
<meta name="description" content="Vue-powered Static Site Generator">
Kudos for choosing Vue tho =)
Regarding Vue I use it daily at my job, great library :)
I think the API Shale provides is pretty sane. I would probably use it in my next Ruby/Rails project. I don't like the fact that Nokogiri is included by default, it would be nice to declare a core type, and then bring in what you need (JSON, XML, YAML) as a different gem. But that's not a deal breaker for me.
I have created my own serializers in the past (SimpleAMS[1]) because I really detested AMS, no offence to AMS contributors, but AMS library should just die. Rails, and way more importantly Ruby, should come up with an "official" serializers/deserializers library that is flexible enough, rock solid and fast. For instance I had done some benchmarking among common serializer libraries [2] and AMS was crazy slow, without providing much flexibility, really (meaning, slowness is not justified). Others were faster, but were supporting only one JSON spec format (like jsonapi-rb). I am wondering where shale stands.
Another thing is that most serialization libraries seem to have ActiveSupport as a main dependency (not shale though) which I think is a bit too much, and actually has a performance hit on the methods it provides.
I really think that Ruby community can do better here ?
[1] https://github.com/vasilakisfil/SimpleAMS
[2] https://vasilakisfil.social/blog/2020/01/20/modern-ruby-seri... (scroll towards the end for benchmarks)
See http://www.ruby-lang.org/en/news/2021/04/05/xml-round-trip-v...
Personally though, I've been seeing almost 10x the amount of alerts for useless "vulnerabilities" like ReDOS in nodejs projects though. Either way, alert fatigue is real.
Safely parsing untrusted XML is an extremely hairy task.
[0]: https://github.com/okuramasafumi/alba
It's a shame Ruby and Rails are not getting all the recognition they deserve.
Go can be extremely productive but it's definitely not a great choice if you need to create a web app over a weekend.
RoR, Django etc have ready solutions for things like authorization\authentication, administration tools, oauth... Not to mention that 'framework' assumes some sort of contracts so that all thing build for the framework in question can talk to each other.
Go is a good choice if you need to build a custom solution for your needs. Not if you are looking for a set of building blocks you have to configure for your task.
I really liked Go for what it was, but it wasn't the right fit for my set of problems.
Have really enjoyed using it recent in my Rails apps.
Deleted Comment
But you can just give an upgrade path! consider something like this:
and then, for _dead_ simplicity, you can add another method generate_mapped_class "Person"which will define that PORO class for user for extra DRYness. API is basically the same, no repetition, but amount of rewrite with new requirements is drastically less.
I'm not asking you to rewrite your library, and I probably won't write and release mine, just saying that considering future self isn't that hard. And yeah, it's a bit of a rant about ActiveRecord from user of Rails, since 2006.
Deleted Comment
I'll probably give it a go to replace my current implementation using nokogiri-happymapper (https://github.com/mvz/happymapper)