I've experienced B2 throwing a wrench into the dream of low latency, but some object stores are very fast. And more importantly you only need the first couple megabytes of each video to be on fast storage.
I've experienced B2 throwing a wrench into the dream of low latency, but some object stores are very fast. And more importantly you only need the first couple megabytes of each video to be on fast storage.
You of course, can do this anyway. PeerTube allows you to completely disable transcoding. But again that means you're streaming the full resolution. Your client may not like this.
If realtime performance is your concern I think PeerTube allows you to pre-transcode to disk. If there is a transcoded copy matching the client request, the server streams that direct with no extra transcode.
To answer your question: shifting transcode onto the client won't improve performance and will greatly increase bandwidth requirements in exchange for less compute on the server. You almost certainly do not want this.
Is it inconvenient to transcode before/during upload?
5 seconds is somewhat exaggerating, I clicked through 10 or so videos on my instance to check and it's 2-3 seconds most of the time.
Would it change the equation, meaningfully, if you didn't offer any transcoding on the server and required users to run any transcoding they needed on their own hardware? I'm thinking of a wasm implementation of ffmpeg on the instance website, rather than requiring users to use a separate application, for instance.
Would you think a general user couldn't handle the workload (mobile processing, battery, etc), or would that be fairly reasonable for a modern device and only onerous in the high traffic server environment?
I think the user experience would be quite poor, enough that nobody would use the instance. As an example a 4k video will transcoded at least 2 times, to 1080p and 720p, and depending on server config often several more times. Each transcode job takes a long time, even with substantial hwaccel on a desktop.
Very high bitrate video is quite common now since most phones, action cameras etc are capable of 4k30 and often 4k60.
> Do you think a general user couldn't handle the workload (mobile processing, battery, etc), or would that be fairly reasonable for a modern device and only onerous.
If I had to guess, I would expect it be a poor experience. Say I take a 5 minute video, that's probably around 3-5gb. I upload it, then need to wait - in the foreground - for this video to be transcoded and then uploaded to object storage 3 times on a phone chip. People won't do it.
I do like the idea of offloading transcode to users. I wonder if it might be suited for something like https://rendernetwork.com/ where users exchange idle compute to a transcode pool for upload & storage rights, and still get to fire-and-forget uploads?
I guess it is more an alternative for Microsoft Stream than youtube really as it is more likely to be used as an internal video communication platform for a company than a public video streaming platform.
[1] if the audience is small, you are just fine sharing vids using the html video tags
And then, people watching videos are used to the YouTube experience with its world class CDN infra enabling subsecond first frame latencies even for 4k videos. They go on Peertube and first frame takes like 5 seconds for a 1080p video...realistically, with today's attention spans most of them are going to bounce before it ever plays.
I really appreciate you walking through that; it's an eye-opener! It seems like you not only deal with a considerable amount of five-minute-or-greater videos, but much higher quality than I was expecting, too.
I also like the idea of user-transcoding because, honestly, I think it's better for everyone? I would love if every place I uploaded video or audio content offered an option to "include lower-quality variants" or something. Broadly, it's my product; I should have the final say on (and take responsibility for) the end result. And for high-quality stuff, the people who make it tend to have systems equipped to do that better anyway. So they could probably get faster transcoding times by using their own systems rather than letting the server do it. Seems like a win-win, even outside of the obvious benefits of "make a whole lot of computers do only the work they each need done, instead of making a few computers do the work that everyone needs done". With the only slight downside of the "average user" having some extra options that they don't understand which cause them to use it wrong and then everyone hates your product. Yay, app development.