The thing that I like about Duckling is that it is a rules based system, which can easily be interrogated. Model based text extraction is much harder to fix when there is a bug. I use Duckling as a service in value extraction from queries and content alongside a model based system for NER (such as spaCy). Using both together makes for more accurate enrichment in general (by cross referencing between the two for values, and adding exception rules)
(Very small correction: Duckling is rule-based but uses a super simple Naive Bayes classifier to prioritize between the many potential parses produced by the rules -- we see it as a hybrid approach)
Interesting! When I worked at IBM, we evaluated Duckling (the Haskell version) for use in the Watson Assistant product but decided to write our own numerical quantity parser/interpreter. We used ANTLR and created context-free grammars as we found that we could improve both precision and recall substantially. Sadly not open source though.
I must say it looks very eat from the point of view of usability. Are the training data sets open? Do you see feasible for small app coders (who don’t have thousands of examples to train) to use Duckling as more or less NLP parser without getting too much deep into the NLP and AI theory?
Are the trained sets mean to be used by different client code or languages?
Duckling is relevant to parse very structured language, typically temporal expressions (dates and times...). It relies on a mix of rules and machine learning. Rules and datasets for many (human) languages are available in the repo. You don't need a lot of data to add support for what you need, owing to this hybrid rules+ML approach (as opposed to just ML).
Hi, thanks for dropping in. What's the status of Clojure implementation? Would you recommend new projects to use it? Is anyone looking at new/old issues? Are there potential new maintainers for Clojure version?
The current Clojure version is quite stable, we used it at wit.ai/Facebook for several years before moving to Haskell.
I'd love to see somebody taking over and resuscitate it! One interesting direction could be to remove Java dependencies (mostly to Date) so that it's usable in ClojureScript. It would make a great JS library.
TL;DR Haskell made more sense for us to scale with the number of requests (existing FB infra) as well as the number of engineers working on the project (type checking, etc).
I came here to mention the same thing. I experimented with the Clojure version a long while ago, and evaluated the Haskell version about a year ago for a project at work. Good stuff.
(Very small correction: Duckling is rule-based but uses a super simple Naive Bayes classifier to prioritize between the many potential parses produced by the rules -- we see it as a hybrid approach)
Are the trained sets mean to be used by different client code or languages?
Duckling is relevant to parse very structured language, typically temporal expressions (dates and times...). It relies on a mix of rules and machine learning. Rules and datasets for many (human) languages are available in the repo. You don't need a lot of data to add support for what you need, owing to this hybrid rules+ML approach (as opposed to just ML).
I'd love to see somebody taking over and resuscitate it! One interesting direction could be to remove Java dependencies (mostly to Date) so that it's usable in ClojureScript. It would make a great JS library.
Deleted Comment
TL;DR Haskell made more sense for us to scale with the number of requests (existing FB infra) as well as the number of engineers working on the project (type checking, etc).