Readit News logoReadit News
periheli0n · 3 years ago
The real shocker is that it’s 2022 and LaTeX is still the best writing environment for a PhD thesis. It has so many downsides: the markup syntax is ugly, it really works best only if one used paginated output such as PDF, a zoo of partly incompatible packages, need for compilation, obscure figure placing algorithms that are difficult to control, and so on.

It still beats the competition because of rock-solid referencing, both to in-text elements like equations, chapters, etc as well as citing literature with bibtex.

Plus, it’s extremely stable, so someone who learnt LaTeX 20 years ago, like yours truly, can download the newest TeX distribution and feel at home immediately.

Nevertheless, I would prefer a Markdown-based system that can use CSS and MathML, and has a 100% bibtex clone for references.

Yes, pandoc goes quite a long way along this route, but setting up such a pipeline is still too complicated for many.

analog31 · 3 years ago
It must depend on the field. A close relative of mine is a PhD advisor in a science field. He's hands-off about it, but is also aware of what his students are doing. If asked, he recommends MS Word, which is also what he uses for his manuscripts.

My own experience was as a physics student, 30 years ago. Students paid a heavy price for being able to print and submit the entire thesis with no manual intervention. The students who chose LaTeX took the longest at it. I didn't have access to a Unix terminal anyway, and banged out my thesis on an MS-DOS machine. Whatever my word processor couldn't support, I added by hand. The readers were OK with this.

My solution to all typographic problems was "take care of it after defense." I spent a few days after my defense getting my copy to be ready for duplication, including sticking all of the page numbers on with glue because I couldn't make inline figures work.

periheli0n · 3 years ago
Sure, one can write a thesis in MS Word. It has come a long way with support for large documents. But I still find its referencing clumsy, opaque and unstable.

For example, automatic updates of figure numbers in captions and references: Countless times it failed on me and I had to manually recreate the fields, bookmarks, cross-references, and whatnot is needed.

Bibliographies are hardly doable without an external tool that comes with its own headaches.

Typography in MS word is quite decent these days, though. Anyway, the content of a PhD thesis shouldn't be judged by its typography (as long it maintains a readable standard).

godelski · 3 years ago
I think things have changed a bit since you were a physics student. Conferences hand out latex templates and expect you to use them (wish they would also hand out an overleaf template. If any conference organizers are reading this...). Universities also do this with their undergrad/masters/thesis templates. Arxiv expects you to upload tex source code (it'll reject a PDF if you wrote that PDF with latex. It also is terrible at error messaging which is a huge pain since submission timing is for some stupid reason important). I'm sure latex is also easier than back then, but there's a lot of momentum in the latex direction that I think would be really difficult to undo. Even paper acceptance is highly influenced by formatting and figure design. I think it is just a different world as we have a lot more researchers now than even 30 years ago.
KronisLV · 3 years ago
> If asked, he recommends MS Word, which is also what he uses for his manuscripts.

My university actually required that people use MS Word for their thesis, which seemed to work out okay for many, despite such a top down approach not seeming like the best option.

Personally, I used LibreOffice anyways and while it was certainly as clunky as Word (especially once images, diagrams and formulas got involved), it was also passable.

Except that things like bibliography refused to work correctly and completely broke, about which I wrote a bit of a rant: https://blog.kronis.dev/everything%20is%20broken/libreoffice...

nextos · 3 years ago
LaTeX has, like Org Mode, this mythical aura of being super hard. However, replicating the functionality of Word is trivial and takes an hour or two for a savvy computer user to grasp.

There's always Overleaf, Pandoc or LyX to make things even simpler. LyX in particular deserves to be better known.

Complex things, like TikZ, are of course difficult and time consuming. But those are impossible using Word.

IMHO, the biggest advantages of LaTeX are reproducibility and reference management. Big Word documents are quite fragile. And reference management is a mess.

godelski · 3 years ago
Honestly, it isn't the writing part that annoys me the most. It is tikz and the fact that I can't make animations in beamer. Just resolving these issues would go a long way for me. Tikz could be fixed simply if there was a GUI that could allow for sliders or moving specific objects. Or at least a better way to make a good grid (tip: draw a grid on your canvas, draw whatever you want, remove grid). Things are so difficult to properly line up, even if we have mathematical representations. It shouldn't be that hard...
abdullahkhalids · 3 years ago
I recently discovered this python interface for tikz https://github.com/allefeld/pytikz

While it does not directly address the issues you point at, it does alleviate some issues.

* The syntax is somewhat easier to parse.

* It is a lot easier to write functions to redraw the same components over and over again.

* Doing math calculations to systemically place objects in relation to each other is a lot easier because python's arithmetic syntax is a lot more intuitive than TeX's.

Of course, this does mean that you have to fire up python to draw figures.

enriquto · 3 years ago
> I can't make animations in beamer

Not yet. But you can easily learn to do them:

    \usepackage{animate}
    ...
    \animategraphics[width=10em,loop,autoplay]{4}{a_}{0}{10}
will animate at 4fps the sequence of images a_0.png through a_10.png

nicodjimenez · 3 years ago
Mathpix Markdown is an attempt and bringing together the best of words (Markdown and LaTeX) while providing excellent interoperability with LaTeX, meaning you can easily export your Mathpix Markdown documents to LaTeX, including equation references, tabular environments, images, etc:

https://github.com/Mathpix/mathpix-markdown-it

Disclaimer: I'm the founder of Mathpix.

runningmike · 3 years ago
I would strongly recommend MyST. MyST extends Markdown for technical and scientific communication. See https://www.myst.tools/
abdullahkhalids · 3 years ago
I tried MyST recently. All I see is a markup language that slowly become more and more complex over time to support more and more features that LaTeX already supports while at the same time acquiring the same syntax complexity of latex.

What people don't acknowledge is that there is a base level of syntax complexity needed to produce fully general documents. If you do, the natural conclusion is that to fix latex, you need a full rewrite of latex with minor changes to fix all the inconsistencies that have crept into it.

rowanc1 · 3 years ago
My PhD written in MyST is here:

https://phd.row1.ca/phd/introduction

Allows web-first, as well as a PDF output (via LaTeX).

V1ndaar · 3 years ago
Going by the documentation it does it by... drumroll... converting to LaTeX!

(edit: generating PDFs that is)

chaoxu · 3 years ago
Have you tried Quarto? It should tick everything in your box (except MathML, but hey that might work too since Quarto is built on pandoc)
countrymile · 3 years ago
+1 for quarto, i wrote my thesis in rmarkdown which flipped easily between latex and html output, with a bibtex referencing system. It also allowed you to inline latex for more complex outputs. And inlining calculated tables and charts meant i could keep my writing and code together. Quarto is the successor.
periheli0n · 3 years ago
Thanks for the pointer, that looks interesting. Especially because it is open source!

I see it supports Jupyter notebook. Math support in those isn’t too bad at all, so it might just work for many cases.

vlmutolo · 3 years ago
I'm looking forward to trying Typst when it's available. It's the first LaTeX alternative that's ever interested me.

http://typst.app/

mcbuilder · 3 years ago
think the most solid way, and the way I'd do it now if starting my PhD all over, would be a bunch of org docs with LaTeX weaved in.

Pandoc I think really e a different niche, hard to imagine really a complicated document benefiting from even more pipelining. You're editing a lot...

thangalin · 3 years ago
> Nevertheless, I would prefer a Markdown-based system

My free, cross-platform desktop Markdown editor, KeenWrite[1], integrates with the ConTeXt typesetting software[2]. I'm working on a branch to make integration containerized[3] because its installation is painful. KeenWrite limits math to plain TeX[4] so that the output can be rendered using any TeX-based typesetter (ConTeXt, LaTeX, MathJax, εχTEX, etc.).

Here's a sample document typeset using ConTeXt (skip to page 40 for the math):

https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...

That document theme is called Solare[8].

> that can use CSS and MathML

Adding CSS mixes presentation logic with content, which is something KeenWrite strives to avoid. Instead, KeenWrite implements Pandoc's annotation syntax to keep presentation logic out of the content. I've written about this extensively in my Typesetting Markdown series[5].

You can produce some pretty amazing documents just with annotations, such as the following that I wrote in Markdown and typeset using ConTeXt:

https://impacts.to/downloads/lowres/impacts.pdf

> has a 100% bibtex clone for references.

Markdown fails at references. At some point, I'd like to implement cross-references in KeenWrite. Except there's at least six competing standards for the syntax, which I've also remarked upon[6], making the choice of syntax difficult[7].

> setting up such a pipeline is still too complicated for many

FWIW, my Typesetting Markdown series, which explains how to set up a typesetting pipeline using Pandoc, is one of the reasons I developed KeenWrite: to replace that entire pipeline (R, Markdown, externalized variable interpolation, math, and typesetting) with a single tool.

[1]: https://github.com/DaveJarvis/keenwrite

[2]: https://wiki.contextgarden.net/Installation

[3]: https://github.com/DaveJarvis/keenwrite/blob/1_typeset_using...

[4]: https://github.com/DaveJarvis/keenwrite/blob/main/docs/scree...

[5]: https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

[6]: https://talk.commonmark.org/t/cross-references-and-citations...

[7]: https://xkcd.com/927/

[8]: https://github.com/DaveJarvis/keenwrite-themes/tree/main/sol...

vouaobrasil · 3 years ago
I disagree with the author that PDFs are a terrible format. They guarantee layout, which is very important for complex scientific presentations. Even slight differences in layout can make a complex set of equations difficult to parse. LaTeX also has a much superior word-break/hyphening algorithm to the HTML engines of browsers.

I find PDF math papers easy to browse, unlike the author. They're much easier and more organized than a website, can be easily searched and have a *proper table of contents* compared to websites. As for poorly browsable on a phone -- well I think that is irrelevant because nobody is going to read a complex technical paper in practise on a phone. They do look decent in tablets, and as for screen readers...well that's a valid point but screen readers don't work well for material with lots of equations anyway.

I applaud the author for the effort but looking at the result, I would not want to read math that way.

periheli0n · 3 years ago
> nobody is going to read a complex technical paper in practise on a phone

I do, in fact. Or rather, I often would like to but with PDF? No chance. IEEE explore online reading sometimes works, but it would work better if they cleaned up their UI to be compatible with phones.

I have read thousands of pages of fiction on a phone and quite enjoyed it. Phones are great for reading if the content reflows properly.

Now publishers and content creators would need to embrace non-paginated, reflowing output. This would not only facilitate reading on phones, but also on tablets and laptop screens.

O‘Reilly‘s online platform does a good job with their app.

There is zero reason why paginated output should be the default in 2022.

oplaadpunt · 3 years ago
Yes, fiction works because the layout is simple, consisting of text, and maybe images?

Research papers are far more complex, and have established standards that aid quick reading and parsing. I absolutely don't want to deal with reflowing equations, reflowing figures, or whatever when publishing papers. Precise margins and column widths.

auggierose · 3 years ago
O'Reilly doesn't publish math books. All math books in epub/mobi format look like garbage. There isn't a single exception. If you know of one, please tell me. It seems currently too hard to get layout, resolution and inline formulas right in a portable format.
jmhammond · 3 years ago
> and as for screen readers...well that's a valid point but screen readers don't work well for material with lots of equations anyway.

This is something that we’d like to change. There are many visually impaired students who need to learn mathematics the same as you and I.

My “eyes were opened” when I was working with a blind student in my class. The textbook I’d written in pretext (transpiled to pdf and HTML) could be read on his BrailleNote but some of the equations were wonky, so I rewrote them to work for everyone.

It would be better if we developed tools to make them work for everyone straight away, instead of relying on authors. That’s one of my career goals.

felixfbecker · 3 years ago
I applaud you for this.

I think MathML (which has gotten much better in browsers, thanks to Igalia[1]) is a much better bet we have to make this possible than LaTex compiling to PDF.

[1] https://mathml.igalia.com/

godelski · 3 years ago
You can't have animations with PDFs. Anyone using beamer is familiar with this frustration. But animations are incredibly helpful in explaining many works. 3Blue1Brown became so popular in major part due to his use of (fantastic) animations that more easily explain the material than any static image could.
jimhefferon · 3 years ago
The animate package from CTAN draws animations in PDFs. It has limitations (most pdf readers won't show them because they rely on some JavaScript), but it does work.
jech · 3 years ago
> I find PDF math papers easy to browse

So do I. Still, I wish LaTeX produced easily reflowable PDFs, especially when a document is formatted in two columns.

enriquto · 3 years ago
But it does, doesn't it? You add the "twocolumn" option and recompile. Unless your LaTeX is too fancy this will tipically give a very good result (at worst, some figures with hardcoded sizing will be awkardly placed).
mistrial9 · 3 years ago
what you are asking for is called a "round-trip" by some printers.. This was requested the week after PDF was invented! It does work, unless it does not.. the company that invented this technology is apparently infested by MBAs and charismatic nobodies, since they announced they are exiting the type "business" ? Our house of cards is showing.
baby · 3 years ago
Check zotero. It has that feature
gnull · 3 years ago
If your equations are in MathML, the browsers should be able to screen read them at some point.

> Even slight differences in layout can make a complex set of equations difficult to parse.

Such set of equations should normally be represented by a single block, I can't imagine a reason why layout should change inside that block.

The layout of pdf is unnecessarily rigid. When I'm reading it on my screen, there's no reason the text should be split into A4 pages with very specific margin values. Latex also often moves your figures a few pages ahead because they didn't fit on the specific page. There's absolutely no reason for that when you have access to the big continuous canvas of an html page. This works for equations too; if you have a long equation block that happens to be right between two pages, you either have to let one page have a gap, or reorder/rewrite your paragraphs to make the equations fit. None of this has a good excuse when it's read on a screen.

I don't think we need a website, but a js-free webpage with hyperlinks would be a lot better than pdf. Pdfs I find imperfect but ok.

periheli0n · 3 years ago
> I don't think we need a website, but a js-free webpage with hyperlinks

Wasn't this precisely the use case for HTML and the WWW as originally conceived by Berners-Lee and his fellow internet pioneers?

TacticalCoder · 3 years ago
> LaTeX also has a much superior word-break/hyphening algorithm to the HTML engines of browsers.

And because the PDF has a fixed layout it's also much easier to prevent "rivers" in paragraphs. Which hence makes it a no-brainer to use justification. To me many print publication using justified text (including LaTeX documents) are a thing of beauty and I do hate how "left align" breaks the flow of reading. I'm taking slightly different spacing between words due to justification every day over horizontal lines of different length, which I find fugly and confusing beyond repair.

More hyphenation controls are coming to CSS and, one can dream, it may be possible one day to programatically detect rivers?

Meanwhile rivers be damned, I override anyway many sites and add "text-align: justify". The nice thing is: because "text-align: left" is the default many sites and minifiers do not bother with text-align at all, so adding "text-align: justify" works for many, many, many sites.

And I only half-buy anyway the justifications (ah!) for left alignment on the Web.

It's basically saying: "We know better than people who've been working in print since decades (or more), left align is easier to read". I don't buy it. Left align breaks my reading flow. And I cannot be the only one.

To me left align is trading potentially ugly looking paragraphs (due to rivers) for certainly ugly looking paragraphs (due to left justification: just look at the right of each paragraph... Such lack of clarity, such chaos cannot be unseen. It's pure fail).

P.S: I've actually typeset books both in LaTeX and QuarkXPress and their were justified, not left-aligned.

extra88 · 3 years ago
> I override anyway many sites and add "text-align: justify".

I think you're an outlier in your strong preference for justified text but this serves as an example in favor of using HTML to present content. Well made web content is much more malleable by users to make it meet their needs and preferences.

dan-robertson · 3 years ago
I think you give latex more credit than it deserves. It gives little straightforward control over layout and the only reason documents are manageable is that pages are fixed size and layout changes are mostly local.

It’s paragraph breaking was state of the art when it was new but other systems break paragraphs now and potentially better. I also think ragged margins aren’t really a problem.

I think if layout mattered as much as you imply, scientists would have to use a tool that offers more control like indesign.

None of this is to say that getting good layout in HTML is easy, of course.

periheli0n · 3 years ago
> I think if layout mattered as much as you imply, scientists would have to use a too that offers more control like indesign.

Yes, precisely that. As a scientist I don't even want to have to deal with layout. That's what publishers are paid extremely well for. When I self-publish content I want the process to be as simple as possible. If this means ragged margins, browser-default styles for headings etc., default colors and fonts — so be it.

(but to be fair, optimising the layout is an excellent way to procrastinate on doing hard research)

ta123456789 · 3 years ago
PDF papers are also much easier to save/archive and use offline. And great for printing
hgsgm · 3 years ago
We need either an app that can compile LaTeX source (+all included libs, which sounds like a lot, but it's equivalent to a JS-heavy web page) on all the clients (preferably as a browser plugin or integrated feature!)

or authors should distribute their PDFs as bundles that include formatted versions for all of paper, large screen, and small screen.

patrick451 · 3 years ago
The standard single column of content layout of nearly every webpage is a bad fit for scientific content because the information density is way too low. A pdf, where I can display multiple pages, each with two columns, side-by-side is much better. This is really handy when you need to do something like refer to a figure/equation/table on the last page. I have yet to see any website solve this lack of density problem in any meaningful way. Of course, paper is still better, but I'll take a pdf over a web site any day.

A pdf is also much easier to archive. The job of sci-hub would but a lot more difficult if every paper came with separate html, css, javascript, and images.

CJefferson · 3 years ago
Screen readers work perfectly fine with mathml. At worst one can just get the screen reader to read the latex for maths and browse the rest in nice HTML.

On the other hand, PDFs generated from Latex are completely useless for screen readers.

mavhc · 3 years ago
Get rid of the 2 column thing and most people would be happy.

What guarantees of layout do you require?

In related news, MathML is back in Chrome v109

michaelt · 3 years ago
> What guarantees of layout do you require?

Some people write documents that can only be clearly presented on a 15" or larger display. Maybe a comparison table with a bunch of columns, maybe a detailed chart, maybe a PCB schematic, whatever.

These people, being considerate of their readers, want to ensure if someone with a 13" screen comes along, they'll get scrollbars or small text, rather than a badly reflowed table where the word 'Yes' gets split over 3 lines.

Other people want to read those documents on 5" phone displays.

dwheeler · 3 years ago
The problem here is specific to LaTeX. I wrote my PhD dissertation using OpenOffice.org (now use LibreOffice), and generating HTML was easy (I posted the HTML).

But the author is right, LaTeX is widely used, translating it to HTML is hard, and there are no incentives to make or improve tools. Even if you don't want HTML, it'd be good for the LaTeX tools to automaticalky generate reflowable PDF for accessibility. There should be a process for funding infrastructure to accelerate science, and this would be a good example.

There's an interesting trick you could try. PDF supports embedding other contents. LibreOffice, for example, can slip its original edited file into a generated PDF, producing perfectly editable PDF. Maybe a variant of this idea could be used, e.g., store the LaTeX source of HTML in the PDF, so people can "get the PDF" yet still have options. But that's just a side idea, the real issue is funding infrastructure for science.

DominikPeters · 3 years ago
I would recommend using the lwarp package for turning large latex documents into HTML. Pretty much all other converters attempt to parse the tex files, which is an almost hopeless task. Lwarp has a different strategy: it redefines all macros to produce HTML (e.g. \textbf{example} writes "<strong>example</strong>" into the output pdf) within latex, thereby producing a PDF containing HTML code. It then uses a pdf2txt extractor to get the finished HTML file. Thus, it uses latex to parse the latex.

Lwarp worked for me to produce an HTML version of the TikZ documentation (https://tikz.dev), and that's probably one of the more complicated tex documents that exists. (Though granted, this was still a major effort.)

gdprrrr · 3 years ago
Yeah, it's well known that only LaTeX can parse LaTeX because you can redefine all syntax (catcodes) in the middle of the document.
V1ndaar · 3 years ago
Currently finishing up my own PhD thesis. My approach to the same problem is quite different. I write my thesis in Org mode. Exporting to HTML is pretty painless. Been doing the same for years for my notes. PDF export via LaTeX & HTML export. LaTeX and PDFs fail pretty hard when including source code (some literate programming in Org). That was my initial motivation behind also producing HTML.

The final thesis that I will hand in is of course a regular PDF (well, a print based on that). But the HTML version can contain lots more stuff that doesn't fit (and belong) into the actual paper thesis, e.g. code snippets to generate plots etc. (optional export of Org subsections). By publishing the git repository of the thesis, linking all code and data + a bit of work -> full reproducible thesis.

hoosieree · 3 years ago
Heh, I also write papers in org and am currently writing my dissertation in org.

Source code is always a pain to export for PDF, especially when switching from 1 to 2 column layout depending on the publication.

My blog is written in org too, but I post-process to make it fit in with the rest of my static site. At some point maybe I'll get enough free time to swap out my makefile setup for org-publish, but if it ain't broke...

To anyone who'll listen I advocate for org-mode as a better alternative to Jupyter notebooks, Markdown, and LaTeX. It's in some ways the antithesis to "do one thing well". If you try to do N things well while adhering to the unix philosophy you end up learning N different tools. But org-mode is one tool that does N things well, and some of the things you learn doing thing N transfer to thing N+1, so you get economies of scale.

taink · 3 years ago
How do you plot graphs with org? I've been trying to use it for that purpose but I can't wrap my head around how to do it without some tikz incantation I don't really understand. I've seen gnuplot mentioned here and there but the setup seems pretty involved.

I'm looking for a way to plot simple numeric data signals in time series, which are pretty trivial in jupyter notebooks.

janeway · 3 years ago
IMO, the pdf version should contain the exact information that any other format might contain. Which version is the thesis and is everything contained. Anything else is just for your own interest.

But your approach sounds great. Good luck!

aitchnyu · 3 years ago
In The Art of Unix Programming (2003) the authors assert simple text formats can be grepped, awked and are easy to compose in a text editor. Hand typing xml is cruel. But now text editors have perfect syntax highligting and squiggles to pinpoint errors, autocomplete, automatic formatting and toolchains to eliminate errors and extract info.

Isnt html or a high level abtraction (like Spectacle) the best tool for the job today?

https://formidable.com/open-source/spectacle/docs/#one-html-...

pjmlp · 3 years ago
Hence why one should use stuff like https://www.oxygenxml.com/

Similar products have been in business since the early 2000's.

bravura · 3 years ago
I think the simplest solution is uploading your thesis to arxiv.org, then using arxiv-vanity (based upon LaTeXML) to render your arxiv link as a responsive web page.
jimhefferon · 3 years ago
LaTeXML does great things, but it also has limitations. A doc that is in generic LaTeX is going to process much better than one with significant customizations. But it is a good tool for sure.
BrandoElFollito · 3 years ago
It was very cool for OP to do a writeup of the effort it took to convert a thesis.

I wanted to do the same with mine but I lost the sources (it was 20+ years ago and did not survive some upgrade/technology change). I was mostly concerned with the .eps files that are hardly portable to .png or similar.

This made me think a bit about preservation of ~recent data (1990-2010). a lot falls in the category of "not natively on the web yet" and "stored on stuff that does not work anymore".