thangalin (u/thangalin)

thangalin commented on The Joy of Mixing Custom Elements, Web Components, and Markdown deanebarker.net/tech/blog... · Posted by u/deanebarker

superkuh · 13 days ago

>In the end, your document is now fully an HTML document, not a Markdown document that becomes an HTML document. It’s a minor perspective shift, but might have some cascading effects on things I’ve written above.

But this style of custom-elements requires successful javascript program execution to achieve that "HTML" document. Just like markdown requires some parser program to turn it in to HTML. It's not really fully an HTML document.

It's a good idea. It just would be a better one to write the custom-elements as wrappers for actual HTML elements. Like how https://blog.jim-nielsen.com/2023/html-web-components-an-exa... shows instead of trying to do it SPA style and requiring perfect JS execution for anything to show properly.

HTML mark-up really isn't that heavy. The avoidance of it seems mostly to be because it's considered "old" and "old" is bad, or at least not useful on a resume. But it's old because it's so good it's stuck around for a long time. Only machine generated HTML is bulky. Hand written can be just as neat and readable as any Markdown.

thangalin · 13 days ago

> It just would be a better one to write the custom-elements as wrappers for actual HTML elements.

pandoc has an extension for this:

https://pandoc.org/demo/example33/8.18-divs-and-spans.html

KeenWrite, my (R) Markdown editor, supports pandoc annotations:

https://youtu.be/7icc4oZB2I4?list=PLB-WIt1cZYLm1MMx2FBG9KWzP...

> Just like markdown requires some parser program to turn it in to HTML.

Or XHTML, which is XML, which can then be transformed into TeX macros, and then typeset into a PDF file with a theme (much like CSS stylizes HTML).

https://youtu.be/3QpX70O5S30?list=PLB-WIt1cZYLm1MMx2FBG9KWzP...

This allows separating content from presentation, allowing them to vary independently.

thangalin commented on Writing a good design document grantslatton.com/how-to-d... · Posted by u/kiyanwang

nrvn · 20 days ago

I used the following sources to create an RFC template (and promote the document culture across the engineering documentation):

- https://www.industrialempathy.com/posts/design-docs-at-googl...

- https://github.com/rust-lang/rfcs

- https://github.com/kubernetes/enhancements/blob/master/keps/...

- https://blog.pragmaticengineer.com/rfcs-and-design-docs/

Hint: tailor the process and template structure based on your org size/maturity and needs. Don’t try to blindly mimic/imitate.

thangalin · 20 days ago

Re: https://www.industrialempathy.com/posts/design-docs-at-googl...

> ... sketching out that API is usually a good idea. In most cases, however, one should withstand the temptation to copy-paste formal interface or data definitions into the doc as these are often verbose, contain unnecessary detail and quickly get out of date.

Using R Markdown (or any Turing Complete documentation system), it's possible to introduce demarcations that allow the source code snippets to be the literal source of truth:

    // DOCGEN-BEGIN:API_CLASS_NAME
    /**
     * <description>
     *
     * @param arg <description>
     * @return <description>
     */
    uint8_t method( type arg );
    // DOCGEN-ENDED:API_CLASS_NAME

Use a GPT to implement a parser for snippets in a few minutes. Then invoke the function from the living document for given a source file, such as:

    `r#
      snippets -> parse.snippets( "relative/path/to/ClassName.hpp" );
      docs -> parse.api( snippets[[ "API_CLASS_NAME" ]] );
      export.api( docs );
    `

The documentation now cannot ever go stale with respect to the source code. If the comments are too verbose, simplify and capture implementation details elsewhere (e.g., as inline comments).

In one system I helped develop, we were asked to document what messages of a standard protocol were supported. The only place this knowledge exists is in a map in the code base. So instead of copy/pasting that knowledge, we have:

    MessageMap MESSAGE_MAP = {
    // DOCGEN-BEGIN:SUPPORTED_MESSAGES
    { MessageType1, create<MessageClassName1>() },
    { MessageType2, create<MessageClassName2>() },
    ...
    // DOCGEN-ENEDED:SUPPORTED_MESSAGES
    }

And something like:

    `r#
      snippets -> parse.snippets( "relative/path/to/MessageMap.hpp" );
      df -> parse.messages( snippets[[ "SUPPORTED_MESSAGES" ]] );
      export.table( df );
    `

This snippet is parsed into an R dataframe. Another function converts dataframes into Markdown tables. Changing the map starts a pipeline that rebuilds the documentation, ensuring that the documentation is always correct with respect to the code.

If a future developer introduces an unparseable change, or files are moved, or R code breaks, the documentation build pipeline fails and someone must investigate before the change goes onto main.

Shameless self-plug: The R Markdown documentation system we use is my FOSS application, KeenWrite; however, pandoc and knitr are equally capable.

https://keenwrite.com/

thangalin commented on Return of wolves to Yellowstone has led to a surge in aspen trees livescience.com/animals/l... · Posted by u/geox

thangalin · a month ago

https://www.youtube.com/watch?v=W88Sact1kws

"Embark on a journey to Yellowstone, where a few wolves did not just roam, but rewrote the rules of an entire ecosystem. Discover how these majestic predators triggered a cascade of life, transforming not only the park's wildlife but its very rivers and landscapes. It's a story of how nature's architects can reshape our world in ways we never imagined."

thangalin commented on Never write your own date parsing library zachleat.com/web/adventur... · Posted by u/ulrischa

thangalin · a month ago

On a slightly related note, here's an algorithm for parsing time from natural inputs into a normalized time:

https://stackoverflow.com/a/49185071/59087

thangalin commented on Yt-transcriber – Give a YouTube URL and get a transcription github.com/pmarreck/yt-tr... · Posted by u/Bluestein

adamgordonbell · a month ago

Recently, I was working on a similar project and I found that grabbing the transcripts quickly leads to your IP being blocked for the transcripts.

I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.

Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?

thangalin · a month ago

    systemctl start tor
    yt-dlp --proxy socks5://127.0.0.1:9050 --write-subs --write-auto-subs --skip-download [URL]

See: https://github.com/noobpk/auto-change-tor-ip

thangalin commented on LibreOffice slams Microsoft for locking in Office users w/ complex file formats neowin.net/news/libreoffi... · Posted by u/bundie

Pooge · a month ago

I know about the lack of tech-savvyness of most humans, but isn't Markdown and Pandoc—if you slam a GUI in front of it—covering the needs of 99% of users?

Granted, when you need formatting, like for a formal letter, you use a template someone made but this is not what most people use Word for.

And don't get me started on "people wouldn't understand how to put things in bold or italics"; they can barely use Word anyway. Might as well use something much simpler. Office "productivity" suites are over to me.

thangalin · a month ago

* https://keenwrite.com/screenshots.html

* https://www.youtube.com/playlist?list=PLB-WIt1cZYLm1MMx2FBG9...

Here are some of my Markdown documents:

* https://impacts.to/downloads/lowres/impacts.pdf (99% pure)

* https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...

* https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

A lot is possible with Markdown, especially with pandoc extensions.

thangalin commented on AI coding tools can reduce productivity secondthoughts.ai/p/ai-co... · Posted by u/gk1

latexr · a month ago

> Writing both the client- and server-side for a PDF annotation editor would have taken 60 hours, maybe more.

How do you know? Seems to me you’re making the exact same estimation mistake of the people in the study.

> Instead, a combination Copilot, DeepSeek, Claude, and Gemini yielded a working prototype in under 6 hours

Six hours for a prototype using four LLMs? That is not impressive, it sounds insane and a tremendous mess that will take so long to dig out of the prototype stage it’ll effectively require a rewrite.

And why are you comparing an LLM prototype to a finished product “by hand” (I surely hope you’re not suggesting such a prototype would take sixty hours)? That is disingenuous and skewing the numbers.

thangalin · a month ago

> How do you know? Seems to me you’re making the exact same estimation mistake of the people in the study.

I have over 20 years of web development experience and 40 years of general experience writing software. I wrote the authors and they confirmed my thoughts:

"I totally believe it! Per the paper abstract, we find many factors driving results - and one of the factors is how experienced the developers are on the codebase, and how big/complex the codebases are.

"Given that this was a new and unfamiliar domain and new codebase, I would expect there to be much more speedup than the domain we studied!"

> Six hours for a prototype using four LLMs?

They have limits on the number of queries, so I used four different LLMs in tandem to circumvent query limits. I didn't write it four times using four different LLMs.

> it sounds insane and a tremendous mess

I posted the code. It's well organized, has few (if any) encapsulation violations, sticks to OOP quite well, works, and---if I knew the PDF.js API---would be fairly easy to maintain.

Yes, I stand by my claim that writing this annotation editor (PHP, HTML, CSS, and JS) would take me about 60 hours by hand and about 6 hours using the LLMs.

thangalin commented on James Webb, Hubble space telescopes face reduction in operations astronomy.com/science/jam... · Posted by u/geox

stogot · a month ago

Why are the budgets for operating these so expensive? The numbers in the article are staggering

thangalin · a month ago

In March, 2024 the US Department of Defense fiscal year 2025 budget request was $849.8 billion. The 2024 JWST budget was 0.022% of that.

https://nasawatch.com/exploration/ernst-stuhlinger

thangalin commented on Dépanneurs walkmontreal.com/curiosit... · Posted by u/thomassmith65

thangalin · a month ago

https://web.archive.org/web/20250612045043/https://walkmontr...

thangalin commented on AI coding tools can reduce productivity secondthoughts.ai/p/ai-co... · Posted by u/gk1

DeepYogurt · a month ago

Have any open source work you can show off?

thangalin · a month ago

Not the OP, but:

https://repo.autonoma.ca/notanexus.git

I don't know the PDF.js library. Writing both the client- and server-side for a PDF annotation editor would have taken 60 hours, maybe more. Instead, a combination Copilot, DeepSeek, Claude, and Gemini yielded a working prototype in under 6 hours:

https://repo.autonoma.ca/notanexus.git/tree/HEAD/src/js

I wrote maybe 3 lines of JavaScript, the rest was all prompted.