Readit News logoReadit News
geuis · a year ago
Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?

Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.

These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.

I've been casually scanning huggingface for relevant models to try out but haven't really found anything.

troymc · a year ago
Check out RealityCapture [1]. I think it's what's used to create the Quixel Megascans [2]. (They're both under the Epic corporate umbrella now.)

[1] https://www.capturingreality.com/realitycapture

[2] https://quixel.com/megascans/

Joel_Mckay · a year ago
COLMAP + CloudCompare with a good CUDA GPU (more VRAM is better) card will give reasonable results for large textured objects like buildings. Glass/Water/Mirror/Gloss will need coated to scan, dry spray on Dr.scholls foot deodorant seems to work fine for our object scans.

There are now more advanced options than Gaussian splatting, and these can achieve normal playback speeds rather than hours of filtering. I'll drop a citation if I recall the recent paper and example code. However, note this style of 3D scene recovery tends to be heavily 3D location dependent.

Best of luck, =3

jocaal · a year ago
Recently, a lot of development in this area has been in gaussian splatting and from what I have seen, the new methods are super effective.

https://en.wikipedia.org/wiki/Gaussian_splatting

https://www.youtube.com/watch?v=6dPBaV6M9u4

meindnoch · a year ago
The parent explicitly asked for a mesh.
geuis · a year ago
Yeah some very impressive stuff with splats going on. But I haven't seen much about going from splats to high quality 3D meshes. I've tried one or two with pretty poor results.
Broussebar · a year ago
For this exact use case I used instant-ngp[0] recently and was really pleased with the results. There's an article[1] explaining how to prepare your data.

[0] https://github.com/NVlabs/instant-ngp

[1] https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_...

GistNoesis · a year ago
>full of holes

On the geometry side from the theoretical point of view you can repair meshes, [1], by inferring a signed or unsigned distance field from your existing mesh, then you contour this distance field.

If you like the distance field approach, there are also research work [2], to estimate neural unsigned distance fields directly, (kind of a similar way as Gaussian splats).

[1] https://github.com/nzfeng/signed-heat-3d [it works but it's research code, so buggy, not user friendly, and mostly on toy problems because complexity explode very quickly when using a grid the number of cells grows as a n^3, and then they solve a sparse linear system on top (so total complexity bounded by n^6), but tolerating approximations and writing things properly practical complexity should be on par with methods like finite element method in Computational Fluid Dynamics.

[2] https://virtualhumans.mpi-inf.mpg.de/ndf/

MrSkelter · a year ago
48 images is an incredibly small number for high quality photogrammetry. 480 wouldn’t be overkill. A couple of hundred would be considered normal.
Elucalidavah · a year ago
> the object was on a rotating platform

Isn't a static-object-rotating-camera basically a requirement for photogrammetry?

jdietrich · a year ago
No. For small objects, it is typical to use a turntable to rotate the object; there are a number of commercial and DIY turntables with an automated motion system that can trigger the shutter after a specified degree of rotation.
Mashimo · a year ago
Why would that make a difference?
falloon · a year ago
Kiri engine is pretty easy to use and just released a good update for their 3DGS pipeline, and they have one of the better 3DGS to mesh options. https://kiri-innovation.github.io/3DGStoMesh2/
archerx · a year ago
>The background is solid black.

>These normally are ideal variables for photogrammetry

Actually no, my friend learned this the hard way during a photogrammetry project, he rented a photo studio, and made sure the background were perfectly black and took the photos but the photogrammetry program (Meshroom I think) was struggling to reconstruct the mesh. I did some research and I learned that it uses features in the background to help position itself to make the meshes. So he redid his tests outside with "messy" backgrounds and it worked much much better.

This was a few years ago so I don't know if things are different now.

tzumby · a year ago
I’m not an expert, only dabbled in photogrammetry, but it seems to me that the crux of that problem is identifying common pixels across images in order to sort of triangulate a point in the 3D space. It doesn’t sound like something an LLM would be good at.
MikeTheRocker · a year ago
Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.
jsheard · a year ago
I can only speak for myself, but a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books, that is, not at all. "Cost to zero" implies drinking directly from the AI firehose with no human in the loop (those cost money) and entertainment produced in that manner is still dire, even in the relatively mature field of pure text generation.
torginus · a year ago
I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.

All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.

MikeTheRocker · a year ago
IMO current generation models are capable of creating significantly better than "slop" quality content. You need only look at NotebookLM output. As models continue to improve, this will only get better. Look at the rate of improvement of video generation models in the last 12-24 months. It's obvious to me we're rapidly approaching acceptable or even excellent quality on-demand generated content.
jdietrich · a year ago
I can only speak for myself, but a large and growing proportion of the text I read every day is LLM output. If Claude and Deepseek produce slop, then it's a far higher calibre of slop than most human writers could aspire to.
echelon · a year ago
You're too old and jaded [1]. It's for kids inventing infinite worlds to role play and adventure. They're going to have a blast.

[1] Not meant as an insult. Working professionals don't have time for this stuff.

Deleted Comment

bufferoverflow · a year ago
Minecraft is procedurally generated slop, yet it's insanely popular.
TeMPOraL · a year ago
Screw Metaverse. Let's make a VR holodeck.

Star Trek's Holodeck is actually a good case study here (especially with the recent series, Lower Decks, going as far as making two episodes that are interactive movies on a holodeck, going quite deep into how that could work in practice both in terms of producing and experiencing them).

One observation derived here is that infinite procedural content at your fingertip doesn't necessarily kill all meaning, if you bring the meaning with you. The two major use cases[0] for the holodeck are:

- Multiplayer scenarios in which you and your friends enjoy some experience in a program. The meaning is sourced from your friendship and roleplay; the program may be arbitrary output of an RNG in the global sense, but it's the same for you and your friends, so shared experience (and its importance as a social object) in your group is retained.

- Single-player simulations that are highly specific. The meaning here comes from whatever is the reason you're simulating that particular experience, and it's connection to the real world. Like idk., a flight simulator of a random space fighter flying over random world shooting at random shit would quickly get boring, but if I can get the simulator to give me a highly accurate cockpit of an F/A-18 Hornet, flying over real terrain and shooting at realistic enemies in realistic (even if fictional) storyline - now that would be deeply meaningful to me, because 1) F/A-18 Hornet is a real plane that I would otherwise never experience flying, and 2) I have a crush on this particular fighter because F/A-18 Hornet 3.0 is one of the first videogames I ever played in my life as a kid.

Now, to make Metaverse less like bullshit and more like Star Trek, we'd need to make sure the world generation is actually available to the users. No asset stores, no app marketplace bullshit. We live in a multimodal LLM era - we already have all the components to do it like Star Trek did it: "Computer, create a medieval fantasy village, in style of England around year 1400, set next to a forest, with tall mountains visible in the distance", then walk around that world and tweak the defaults from there.

--

[0] - Ignoring the third use case that's occasionally implied on the show, and that's really obvious given it's the same one the Internet is for - and I'm not talking about cat pictures.

deadbabe · a year ago
I think you’re being short sighted. Imagine feeding in your favorite TV shows to a generative AI and being able to walk around in the world and talk to characters or explore it with other people.
NBJack · a year ago
It worked for Minecraft.

It was rough at first, and needed plenty of tuning, but the terrain and environments it's capable of certainly have a wide audience.

But as far as pure, unbridled generation goes, yeah; I'm sure there will be plenty of slop made in the coming decade.

hex4def6 · a year ago
I think it has its place. For 'background filler' I think it makes a lot of sense; stuff which you don't need to care about, but whose absence can make something feel less real.

To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.

Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.

noch · a year ago
> a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books

Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?

Read this satirical speech by Claude, in French https://x.com/pmarca/status/1881869448275177764) and in English (https://x.com/pmarca/status/1881869651329913047) and tell me: can you write fiction more entertaining or imaginative than that? Is there someone in your vicinity who can?

Perhaps that's mundane, so is there someone in your vicinity who can reason about a topic in mathematics/physics as well as this: https://x.com/hsu_steve/status/1881696226669916408 ?

Probably your answer is "yes, obviously!" to all the above.

My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.

> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop

No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.

Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.

Deutschland314 · a year ago
AR/VR doesn't has a 3D model issue.

It has a 'why would I strap on a headset for stuff I can do without'

I will not starting meeting friends just because of the meta verse. I have everything I need already.

And even video calls with Whatsapp is alweird as f.

taejavu · a year ago
Jeez I'd love to know what Apple's R&D debt on Vision Pro is, based on current sales to date. I really really hope they continue to push for a headset that's within reach of average people but the hole must be so deep at this point I wouldn't be surprised if they cut their losses.
EncomLab · a year ago
As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software. Until the "visicalc" must have killer app shows up to move the hardware, there is little incentive for general users to make the investment.
InDubioProRubio · a year ago
AR needs a bragging app.. something like the dharma/content you create in virt growing out of your footsteps in real - and why visible on cellphone, feeling more native in with AR-googles

Deleted Comment

PittleyDunkin · a year ago
Maybe eventually. Based on this quality I don't see this happening any time in the near future.

Dead Comment

pella · a year ago
Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA

  TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
  Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
  THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-file

EMIRELADERO · a year ago
I assume it's safe to ignore as model weights aren't copyrightable, probably.
slt2021 · a year ago
you dont know what kind of backdoors are hidden in the model weights
gruez · a year ago
Is this tied to EU regulations around AI models?

Deleted Comment

denkmoon · a year ago
For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)
swframe2 · a year ago
I tried using Hunyuan3D-2 on a 4090 GPU. The Windows install encountered build errors, but it worked better on WSL Ubuntu. I first tried it with CUDA 11.3 but got a build error. Switching to CUDA 12.4 worked better. I ran it with their demo image but it reported that the mesh was too big. I removed the mesh size check and it ran fine on the 4090. It is a bit slow on my i9 14k with 128G of memory.

(I previously tried the stability 3d models: https://stability.ai/stable-3d and this seems similar in quality and speed)

denkmoon · a year ago
Cool, thanks. I'm kinda interested so hearing it at least runs on a 4090 means I might give it a go one weekend.
sorenjan · a year ago
The hunyuan3d-dit-v2-0 model is 4.93 GB. ComfyUI is on their roadmap, might be best to wait for that, although it doesn't look complicated to use in their example code.

https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan...

sebzim4500 · a year ago
Interesting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.
GrantMoyer · a year ago
To me, the bird mesh actually does look like marching cubes output. Note the abundance of almost square triangle pairs on the front and sides. Also note that marching cubes doesn't nescessarily create stairstep-like artifacts; it can generate a smooth looking mesh given signed distance field input by slightly adjusting the locations of vertices based on the relative magnitude of the field at the surrounding lattice points.
TinkersW · a year ago
If they are using MC, does that mean they are actually generating SDFs? If so it would be nice if you could output the SDF rather than the triangle mesh.
wumeow · a year ago
The meshes generated by the huggingface demo definitely look like the product of marching cubes.
godelski · a year ago
As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.

They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2

I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.

  Here's what I got
  Leaf
     PNG: https://0x0.st/8HDL.png
     GLB: https://0x0.st/8HD9.glb
  Guitar
     PNG: https://0x0.st/8HDf.png  other view: https://0x0.st/8HDO.png
     GLB: https://0x0.st/8HDV.glb
  Google Translate of Guitar:
     Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
     PNG: https://0x0.st/8HDt.png   and  https://0x0.st/8HDv.png
     Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole. 
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.

But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)

  Prompt: A guitar
    PNG: https://0x0.st/8HDg.png
    Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
  Prompt: A Monstera leaf
    PNG: https://0x0.st/8HD6.png  
         https://0x0.st/8HDl.png  
         https://0x0.st/8HDU.png
    Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things. 
          It's definitely a leaf and monstera like but a bit of a mutant. 
  Prompt: Mario from Super Mario Bros
    PNG: https://0x0.st/8Hkq.png
    Note: Now I'm VERY suspicious....
  Prompt: Luigi from Super Mario Bros
    PNG: https://0x0.st/8Hkc.png
         https://0x0.st/8HkT.png  
         https://0x0.st/8HkA.png
    Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario. 
          Where is the tie coming from? The suspender buttons are all messed up. 
          Really went uncanny valley here. So this suggests we're really brittle. 
  Prompt: Peach from Super Mario Bros
    PNG: https://0x0.st/8Hku.png  
         https://0x0.st/8HkM.png
    Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
  Prompt: Toad from Super Mario Bros
    PNG: https://0x0.st/8Hke.png 
         https://0x0.st/8Hk_.png
         https://0x0.st/8HkL.png
    Note: Lord have mercy on this toad, I think it is a mutated Squirtle.  
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293

(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)

[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...

Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.

godelski · a year ago
Ops ran out of edit time when I was posting my last two

  Prompt: A hawk flying in the sky
    PNG: https://0x0.st/8Hkw.png
         https://0x0.st/8Hkx.png
         https://0x0.st/8Hk3.png
    Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form. 
  Prompt: A hawk with the head of a dragon flying in the sky and holding a snake
    PNG: https://0x0.st/8HkE.png
         https://0x0.st/8Hk6.png
         https://0x0.st/8HkI.png
         https://0x0.st/8Hkl.png
    Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.

This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.

Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.

BigJono · a year ago
Yeah, this is absolutely light years off being useful in production.

People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.

I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.

Kelvin506 · a year ago
The first guitar has one of the strings end at the sound hole, and six tuning knobs for five strings.

The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.

Are these considered good capability examples?

godelski · a year ago
I take back a fair amount of what I said.

It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.

keyle · a year ago
Thanks for this. The results are quite impressive, after trying it myself.
xgkickt · a year ago
Any user-generated content system suffers from what we call “the penis problem”.
_s_a_m_ · a year ago
Has the word "advanced", gotta be good