Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?

Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.

These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.

I've been casually scanning huggingface for relevant models to try out but haven't really found anything.

troymc · a year ago

Check out RealityCapture [1]. I think it's what's used to create the Quixel Megascans [2]. (They're both under the Epic corporate umbrella now.)

[1] https://www.capturingreality.com/realitycapture

[2] https://quixel.com/megascans/

Joel_Mckay · a year ago

COLMAP + CloudCompare with a good CUDA GPU (more VRAM is better) card will give reasonable results for large textured objects like buildings. Glass/Water/Mirror/Gloss will need coated to scan, dry spray on Dr.scholls foot deodorant seems to work fine for our object scans.

There are now more advanced options than Gaussian splatting, and these can achieve normal playback speeds rather than hours of filtering. I'll drop a citation if I recall the recent paper and example code. However, note this style of 3D scene recovery tends to be heavily 3D location dependent.

Best of luck, =3

jocaal · a year ago

Recently, a lot of development in this area has been in gaussian splatting and from what I have seen, the new methods are super effective.

https://en.wikipedia.org/wiki/Gaussian_splatting

https://www.youtube.com/watch?v=6dPBaV6M9u4

meindnoch · a year ago

The parent explicitly asked for a mesh.

geuis · a year ago

Yeah some very impressive stuff with splats going on. But I haven't seen much about going from splats to high quality 3D meshes. I've tried one or two with pretty poor results.

Broussebar · a year ago

For this exact use case I used instant-ngp[0] recently and was really pleased with the results. There's an article[1] explaining how to prepare your data.

[0] https://github.com/NVlabs/instant-ngp

[1] https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_...

GistNoesis · a year ago

>full of holes

On the geometry side from the theoretical point of view you can repair meshes, [1], by inferring a signed or unsigned distance field from your existing mesh, then you contour this distance field.

If you like the distance field approach, there are also research work [2], to estimate neural unsigned distance fields directly, (kind of a similar way as Gaussian splats).

[1] https://github.com/nzfeng/signed-heat-3d [it works but it's research code, so buggy, not user friendly, and mostly on toy problems because complexity explode very quickly when using a grid the number of cells grows as a n^3, and then they solve a sparse linear system on top (so total complexity bounded by n^6), but tolerating approximations and writing things properly practical complexity should be on par with methods like finite element method in Computational Fluid Dynamics.

[2] https://virtualhumans.mpi-inf.mpg.de/ndf/

MrSkelter · a year ago

48 images is an incredibly small number for high quality photogrammetry. 480 wouldn’t be overkill. A couple of hundred would be considered normal.

Elucalidavah · a year ago

> the object was on a rotating platform

Isn't a static-object-rotating-camera basically a requirement for photogrammetry?

jdietrich · a year ago

No. For small objects, it is typical to use a turntable to rotate the object; there are a number of commercial and DIY turntables with an automated motion system that can trigger the shutter after a specified degree of rotation.

Mashimo · a year ago

Why would that make a difference?

falloon · a year ago

Kiri engine is pretty easy to use and just released a good update for their 3DGS pipeline, and they have one of the better 3DGS to mesh options. https://kiri-innovation.github.io/3DGStoMesh2/

archerx · a year ago

>The background is solid black.

>These normally are ideal variables for photogrammetry

Actually no, my friend learned this the hard way during a photogrammetry project, he rented a photo studio, and made sure the background were perfectly black and took the photos but the photogrammetry program (Meshroom I think) was struggling to reconstruct the mesh. I did some research and I learned that it uses features in the background to help position itself to make the meshes. So he redid his tests outside with "messy" backgrounds and it worked much much better.

This was a few years ago so I don't know if things are different now.

tzumby · a year ago

I’m not an expert, only dabbled in photogrammetry, but it seems to me that the crux of that problem is identifying common pixels across images in order to sort of triangulate a point in the 3D space. It doesn’t sound like something an LLM would be good at.

Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.

jsheard · a year ago

I can only speak for myself, but a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books, that is, not at all. "Cost to zero" implies drinking directly from the AI firehose with no human in the loop (those cost money) and entertainment produced in that manner is still dire, even in the relatively mature field of pure text generation.

torginus · a year ago

I think the biggest issue with stable diffusion based approaches has always been poor compositional ability (putting stuff where you want), and compounding anatomical/spatial errors that gave the images an offputting vibe.

All these problems are trivially solvable (solved) using traditonal 3d meshes and techniques.

MikeTheRocker · a year ago

IMO current generation models are capable of creating significantly better than "slop" quality content. You need only look at NotebookLM output. As models continue to improve, this will only get better. Look at the rate of improvement of video generation models in the last 12-24 months. It's obvious to me we're rapidly approaching acceptable or even excellent quality on-demand generated content.

jdietrich · a year ago

I can only speak for myself, but a large and growing proportion of the text I read every day is LLM output. If Claude and Deepseek produce slop, then it's a far higher calibre of slop than most human writers could aspire to.

echelon · a year ago

You're too old and jaded [1]. It's for kids inventing infinite worlds to role play and adventure. They're going to have a blast.

[1] Not meant as an insult. Working professionals don't have time for this stuff.

Deleted Comment

bufferoverflow · a year ago

Minecraft is procedurally generated slop, yet it's insanely popular.

TeMPOraL · a year ago

Screw Metaverse. Let's make a VR holodeck.

Star Trek's Holodeck is actually a good case study here (especially with the recent series, Lower Decks, going as far as making two episodes that are interactive movies on a holodeck, going quite deep into how that could work in practice both in terms of producing and experiencing them).

One observation derived here is that infinite procedural content at your fingertip doesn't necessarily kill all meaning, if you bring the meaning with you. The two major use cases[0] for the holodeck are:

- Multiplayer scenarios in which you and your friends enjoy some experience in a program. The meaning is sourced from your friendship and roleplay; the program may be arbitrary output of an RNG in the global sense, but it's the same for you and your friends, so shared experience (and its importance as a social object) in your group is retained.

- Single-player simulations that are highly specific. The meaning here comes from whatever is the reason you're simulating that particular experience, and it's connection to the real world. Like idk., a flight simulator of a random space fighter flying over random world shooting at random shit would quickly get boring, but if I can get the simulator to give me a highly accurate cockpit of an F/A-18 Hornet, flying over real terrain and shooting at realistic enemies in realistic (even if fictional) storyline - now that would be deeply meaningful to me, because 1) F/A-18 Hornet is a real plane that I would otherwise never experience flying, and 2) I have a crush on this particular fighter because F/A-18 Hornet 3.0 is one of the first videogames I ever played in my life as a kid.

Now, to make Metaverse less like bullshit and more like Star Trek, we'd need to make sure the world generation is actually available to the users. No asset stores, no app marketplace bullshit. We live in a multimodal LLM era - we already have all the components to do it like Star Trek did it: "Computer, create a medieval fantasy village, in style of England around year 1400, set next to a forest, with tall mountains visible in the distance", then walk around that world and tweak the defaults from there.

[0] - Ignoring the third use case that's occasionally implied on the show, and that's really obvious given it's the same one the Internet is for - and I'm not talking about cat pictures.

deadbabe · a year ago

I think you’re being short sighted. Imagine feeding in your favorite TV shows to a generative AI and being able to walk around in the world and talk to characters or explore it with other people.

NBJack · a year ago

It worked for Minecraft.

It was rough at first, and needed plenty of tuning, but the terrain and environments it's capable of certainly have a wide audience.

But as far as pure, unbridled generation goes, yeah; I'm sure there will be plenty of slop made in the coming decade.

hex4def6 · a year ago

I think it has its place. For 'background filler' I think it makes a lot of sense; stuff which you don't need to care about, but whose absence can make something feel less real.

To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.

Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.

noch · a year ago

> a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books

Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?

Read this satirical speech by Claude, in French https://x.com/pmarca/status/1881869448275177764) and in English (https://x.com/pmarca/status/1881869651329913047) and tell me: can you write fiction more entertaining or imaginative than that? Is there someone in your vicinity who can?

Perhaps that's mundane, so is there someone in your vicinity who can reason about a topic in mathematics/physics as well as this: https://x.com/hsu_steve/status/1881696226669916408 ?

Probably your answer is "yes, obviously!" to all the above.

My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.

> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop

No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.

Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.

Deutschland314 · a year ago

AR/VR doesn't has a 3D model issue.

It has a 'why would I strap on a headset for stuff I can do without'

I will not starting meeting friends just because of the meta verse. I have everything I need already.

And even video calls with Whatsapp is alweird as f.

taejavu · a year ago

Jeez I'd love to know what Apple's R&D debt on Vision Pro is, based on current sales to date. I really really hope they continue to push for a headset that's within reach of average people but the hole must be so deep at this point I wouldn't be surprised if they cut their losses.

EncomLab · a year ago

As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software. Until the "visicalc" must have killer app shows up to move the hardware, there is little incentive for general users to make the investment.

InDubioProRubio · a year ago

AR needs a bragging app.. something like the dharma/content you create in virt growing out of your footsteps in real - and why visible on cellphone, feeling more native in with AR-googles

Deleted Comment

PittleyDunkin · a year ago

Maybe eventually. Based on this quality I don't see this happening any time in the near future.

Dead Comment

TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025 THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.

Here's what I got Leaf PNG: https://0x0.st/8HDL.png GLB: https://0x0.st/8HD9.glb Guitar PNG: https://0x0.st/8HDf.png other view: https://0x0.st/8HDO.png GLB: https://0x0.st/8HDV.glb Google Translate of Guitar: Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere. PNG: https://0x0.st/8HDt.png and https://0x0.st/8HDv.png Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole.

Prompt: A guitar PNG: https://0x0.st/8HDg.png Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic. Prompt: A Monstera leaf PNG: https://0x0.st/8HD6.png https://0x0.st/8HDl.png https://0x0.st/8HDU.png Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things. It's definitely a leaf and monstera like but a bit of a mutant. Prompt: Mario from Super Mario Bros PNG: https://0x0.st/8Hkq.png Note: Now I'm VERY suspicious.... Prompt: Luigi from Super Mario Bros PNG: https://0x0.st/8Hkc.png https://0x0.st/8HkT.png https://0x0.st/8HkA.png Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario. Where is the tie coming from? The suspender buttons are all messed up. Really went uncanny valley here. So this suggests we're really brittle. Prompt: Peach from Super Mario Bros PNG: https://0x0.st/8Hku.png https://0x0.st/8HkM.png Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha Prompt: Toad from Super Mario Bros PNG: https://0x0.st/8Hke.png https://0x0.st/8Hk_.png https://0x0.st/8HkL.png Note: Lord have mercy on this toad, I think it is a mutated Squirtle.

Prompt: A hawk flying in the sky PNG: https://0x0.st/8Hkw.png https://0x0.st/8Hkx.png https://0x0.st/8Hk3.png Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form. Prompt: A hawk with the head of a dragon flying in the sky and holding a snake PNG: https://0x0.st/8HkE.png https://0x0.st/8Hk6.png https://0x0.st/8HkI.png https://0x0.st/8Hkl.png Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...