substructure (u/substructure)

substructure commented on $4.6M Series Seed to defend open source from supply chain attacks socket.dev/blog/series-se... · Posted by u/feross

substructure · 3 years ago

Nice. This blog post is less focused on npm/node than previous ones. Does that imply a broadening of ecosystem support is intended? As I'm sure you're aware, competing companies support more language ecosystems. Hopefully this seed funding will allow for some degree of scope creep.

Were you able to resolve the potential issue with npm's fourth open source condition[1]? The addition of this condition seems to align with their acquisition of ^Lift Security[2] based off of archive.org snapshots of before[3] and after[4]. Shifting away from npm exclusively seems like a reasonable way to hedge against this.

[1] https://docs.npmjs.com/policies/open-source-terms#conditions

[2] https://blog.npmjs.org/post/172793182214/npm-acquires-lift-s...

[3] https://web.archive.org/web/20170926030855/https://docs.npmj...

[4] https://web.archive.org/web/20190207170526/https://www.npmjs...

substructure commented on Reading academic computer science papers stackoverflow.blog/2022/0... · Posted by u/happy-go-lucky

musicale · 3 years ago

I like it when the authors provide digital artifacts including code and data to make it easier to analyze and reproduce the work.

substructure · 3 years ago

There seems to be some improvements in this area. Machine learning papers sometimes provide their data sets and other artifacts on websites like paperswithcode.com.

Have you ever contacted the authors to request their data? I personally have not.

substructure commented on Reading academic computer science papers stackoverflow.blog/2022/0... · Posted by u/happy-go-lucky

substructure · 3 years ago

Solving problems by studying tomes of knowledge is the job description of wizards/witches. Large improvements towards optimality, for some problems, are effectively locked away in some of these papers. As the article points out, there generally isn't much benefit in the context of building CRUD apps.

Some contexts have larger research communities. For example, there isn't nearly as many papers on real-time path planning for agent mutable environments vs static environments. I assume this is because we still don't have Boston Dynamics robots in people's homes. If we could get the cost low enough it may be more profitable to send mining robots to mars than people, but I guess there are other applications as well.

I spent some months trying to find, understand, and implement the state-of-the-art algorithms in real-time path planning within mutable environments(Minecraft). I started with graph algorithms like A*[0] and their extensions. For my problem this was very slow. D* lite[1] seemed like an improvement, but it has issues with updates near its root. Sample based planners came next such as rrt[2], rrt*, and many others.

I built a virtual reality website to visualize and interact with the rrt* algorithm. I can release this if anyone is interested. I've found that many papers do a poor job describing when their algorithms perform poorly. The best way I've found to understand an algorithm's behavior is to implement it, apply it to different problems, and visualize the execution over many problem instances. This is time consuming, but yields the best understanding in my experience.

Sample based planners have issues with formations like bug traps. For my use case this was a large issue. Moving over to Monte Carlo Tree Search(MCTS)[3] worked very well given the nature of decision making when moving through an environment the agent can change. The way it builds great plans from random attempts of path planning is still shocking.

Someone must incorporate these papers' best aspects into novel solutions. There exists an opportunity to extract value from the information differential between research and industry. For some reason many papers do not provide source code. A good open source implementation brings these improvements to a larger audience.

Some good resources I've found are websites like Semantic Scholar[4] and arxiv[5] along with survey papers such as one for MCTS[3]. The later half of this article is what gets me excited to build new things. I would encourage people to explore the vast landscape of problems to find one that interests them then look into the research.

[0] https://en.wikipedia.org/wiki/A*_search_algorithm

[1] https://en.wikipedia.org/wiki/D\\\*

[2] https://en.wikipedia.org/wiki/Rapidly-exploring_random_tree

[3] https://www.semanticscholar.org/paper/A-Survey-of-Monte-Carl... /c37f1baac3c8ba30250084f067167ac3837cf6fd

[4] semanticscholar.org

[5] https://arxiv.org/

substructure commented on Strategy and criteria decision matrix: A framework for decision making shuby.de/blog/post/decisi... · Posted by u/bhaprayan

substructure · 3 years ago

This framework is very similar to an a priori, multi-objective optimization using linearly scalarized weights[0]. It is a priori because the weights are chosen before scoring and kept constant.

I've found this approach works generally well for humans, however results may not be pareto-optimal.

[0] https://en.wikipedia.org/wiki/Multi-objective_optimization#S...

substructure commented on Show HN: Socket – Secure your JavaScript supply chain socket.dev... · Posted by u/feross

substructure · 4 years ago

Congrats on the launch.

There is both a large need for improvements in npm supply chain security and a market willing to pay for them.

Concerns:

1) npm Open-Source Terms condition four[1] states.

You may access and use data about the security of Packages, such as vulnerability reports, audit status reports, and supplementary security documentation, only for your own personal or internal business purposes. You may not provide others access to, copies of, or use of npm data about the security of Packages, directly or as part of other products or services.

This statement seems vague enough to potentially include your use case. It also seems to include what snyk, jfrog xray, sonatype, and white-source do, so maybe this is not an issue.

2) It appears that this will be an open-core business. What capabilities are you willing to provide in the free/community edition and under which licenses?

3) The website doesn't show pricing. Can you provide details on this?

Questions:

1) What are your thoughts on using reproducible builds[2] plus Diverse Double-Compiling (DDC)[3] on the dependency graph to ensure build artifacts originate from known git repositories? Disclosure, I've been working on this for a few months now.

2) Where do you run your analysis? AWS and DigitalOcean have terms that prevent running high risk code.

3) Do you have examples of previous attacks and how your tooling would handle them?

Best of luck.

[1] https://docs.npmjs.com/policies/open-source-terms#conditions [2] https://reproducible-builds.org/ [3] https://dwheeler.com/trusting-trust/

substructure commented on Number of people on government websites now analytics.usa.gov... · Posted by u/wylie39

georgyo · 4 years ago

This is heavily skewed by iOS being used for USPS shipping.

The following is for people logging into government services, it is a better source for metrics on browsers/OS usage.

https://analytics.usa.gov/general-services-administration/

substructure · 4 years ago

That site shows 113.1 million visits in the last 90 days with 47.2% being Windows and 1.2% linux. (47.2/1.2) = 39.3 windows users for every linux user.

I agree that it is skewed in a number of ways. I just wanted to estimate a lower bound.

Thank you for pointing out a better source of data for my use case.

substructure commented on Number of people on government websites now analytics.usa.gov... · Posted by u/wylie39

substructure · 4 years ago

Correct me if I'm wrong, but this shows that in the last 90 days 5.06 billion visits came from 31.3% Windows while 1.1% were from GNU/Linux. If we assume that both groups visit government websites equally often, then for every GNU/Linux user there exist (31.3/1.1) = 28.5 Windows users. Scary stuff.

substructure commented on 3 lines of code shouldn't take all day devtails.xyz/3-lines-of-c... · Posted by u/devtailz

substructure · 4 years ago

AWS CloudFormation had one of the worst user stories related to iteration time a few years ago.

The lack of feedback between the writing of the yaml file and validation of the structure/type/format was frustratingly slow. I would pay good money for better tooling in the Infrastructure as Code(IaC) space.

Waiting for the resources to update, fail with cryptic error messages, then slowly rollback only to then fail the rollback. Now in an invalid state, manual resource creation was required before the rollback would succeed.

The AWS cdk has improved this significantly. As a result the sun shines just a little bit brighter.