Since neural networks are universal, we can often solve problems we don't know exactly how to solve with neural networks. NeRF is a great example. But once solved, we should try to reverse engineer the solution and optimize.
https://arxiv.org/abs/2112.05131 did such work and found you can get NeRF quality without any neural network at all and as a result 100x faster.
Except the plenoxel representation is also 2-3 orders of magnitude larger than an MLP-based NeRF. It’s not very surprising that a sparse voxel representation can capture a plenoptic function. Representation of volumetric video further presses the size disadvantages of voxel based techniques.
The reasons for using deep learning function approximators are manifold, for instance in RL, state and state-action spaces quickly become too large for tabular methods. Using grids or tables also basically closes off the opportunities for exploiting meta-learning and analysis-by-synthesis.
Plenoxels also rely on explicit specification of a known grid structure, where the HyperNeRF method can learn latent parametric manifolds and handle dynamic objects with changing topologies.
An MLP-based NeRF actually has a comparable number of parameters to plenoxels (it's not 2-3 orders of magnitude smaller). The original NeRF is 8 dense layers with 256 channels and then this one adds another network with 6 dense layers with 64 channels, so roughly speaking 7x256x256 + 5x64x64. And remember that the voxel grid is sparse, though we don't get exact numbers here. We shouldn't be miserly with our megabytes in 2021. What concerns me is how HyperNeRF requires 64 hours of training time on 4 TPU v4s; if you want to use this for communication or entertainment, it's light-years away from interactive.
Extending plenoxels to support dynamic objects would be great future work.
NeRF triggered the development of a number of improved methods, though. There's DONeRF [1] that is built upon NNs and is currently faster than comparable solutions (the field evolves fast so I may be wrong).
By the way, in the plenoxel video they say "the key component is the differentiable volumetric rendering, not the neural network".
After watching the fractal community struggle to come up with good distance estimators for years, this makes so much sense. In the end, automatic differentiation came out to be one of the most solid methods for coming up with distance estimators for an arbitrary fractal formula.
What I find most exciting about it is that a NeRF represents images as neural nets, one neural net for each image (in the OP paper generalised to image + deformations). By evaluating the net at various pixel coordinates it gives the color.
Up until now learning to replicate the input exactly was called overfitting and considered a bug, not a feature, but they showed a completely new way to wield neural nets.
An interesting detail is that they depend on Fourier encoding for the input coordinates. A variant called SIREN uses `sin` as activation function throughout the net.
Maybe neural nets will become the data compressors of tomorrow? Shoot a picture, send a neural net around. Game assets could be NeRFs.
For text compression, there is of course the famous Hutter Prize, launched in 2006: https://en.wikipedia.org/wiki/Hutter_Prize ("Prediction is the golden key that opens all locks". Compressing each byte of wikipedia text is equivalent to predicting it-- to compactly represent its knowledge is to understand it.)
1. If the first byte is 0, insert the text of Wikipedia,
2. If it isn't, ignore it, and all further bytes are interpreted literally.
To avoid these sort of "joke" decompressors, they evaluate compressors on (size of compressed data) + (size of decompressor) in the compression competitions last time I checked. That means we won't get a winner based on GPT3 anytime soon. 350+ GB of weights is a lot to overcome :)
Though of course, given enough data to compress, it might well be that full-on neural language models are still worth it.
> Up until now learning to replicate the input exactly was called overfitting and considered a bug, not a feature, but they showed a completely new way to wield neural nets.
Well, for that particular ting there was a predecessor of sort in Deep Image Priors from 2017.
That was all about overfitting a neural net on a single image, which they used to get impressive inpainting, noise removal and superresolution results without any training at all (though of course it did not beat state of the art training-based approaches, even then.)
I had a lot of fun playing around with it when it came. The idea is dead simple, within reach to implement yourself with no complex mathematical understanding.
> What I find most exciting about it is that a NeRF represents images as neural nets, one neural net for each image (in the OP paper generalised to image + deformations). By evaluating the net at various pixel coordinates it gives the color.
Wow. I only get the inner workings at a very basic, intuitive level, but it's really cool to see the progress of this and similar research. Congrats to the researchers.
It's awe-inspiring and even frightening at first, in the usual ways, but IMO it has a lot of long-term promise in other ways.
Spitballing: I like that this kind of result, which clearly calls into question the role or perception of physical identity, may eventually inform (or even necessitate?) the deconstruction of the physical "I" as a permission broker, and further open a many-to-many interface between the dimensions that underlay what we now think of as "self" and the true depth and variety within what we now think of as "individual humans who are not me". That opening process alone ought to be a huge jump for human development.
Right now we're each held, and holding ourselves, way too responsible for maintaining a singular subjective identity, looking at the aggregate. Not only does this compromise our outlook on others based on our subjective perception of the identity match, but it also compromises our ability to reliably consume and metabolize identity-construct-breaking information and experiences. And many of those things, when consumed without so many identity borders--so to speak--will end up being incredibly useful for individuals and group both.
I think the advent of DeepTomCruise [1] make us rethink the solidity of identity. A majority of those watching the videos appear to believe it is the real Tom Cruise, and really, there is no good way to tell any longer whether it is or not, without reference to external information.
There is no reason now that Tom Cruise even needs to exist as a real person, or needs to ever act in a movie ever again. Tom Cruise can just become an abstract concept, no longer a living object. Perhaps it is Tom Cruise himself in these videos. Perhaps the real Tom Cruise no longer exists. Perhaps the whole thing is an elaborate art project. Our certainty of its falsity is tied solely to whether we believe the story of those who claim to have created the videos. Is it easier to create fake videos of Tom Cruise or to create real videos of Tom Cruise and a fake story?
Hmm, that would go into specifics, which IMO are kind of tenuous from the start since the point of a spitball is to be open to unknowns.
So with that said, some ideas could be started around topics like 1) massive identity theft causing a re-thinking of identity 2) creativity and constraints around the moderation of physical identity and 3) technical-presentational dynamics surrounding physical presence and the moderation of identity presentation in a physical presence context.
Any one of these is a great setup for the question: How do we interpret personal identity?
And this--again just IMO--would be an amazing point at which to say, "look, if the only word-tool we can use is 'identity' to describe this crisis/opportunity, then maybe all we have is a metaphorical hammer and we have all these endless annoying nails--in the form of identity questions--to hammer down. But if we had maybe some other word-tools to use instead of 'identity', maybe this really would look more like an opportunity to move humanity one more step up the evolutionary ladder."
We already moderate our identity every day, either consciously or unconsciously. It's been studied for thousands of years. It's in books you've read, movies you've watched. It's been done for fun, for comic relief, and also it's been done to solve mind-shattering problems. But now we start to really unwind this question of physical identity, the one concrete thing we thought was so much more certain...! and things get _really_ interesting. This is a different level, where there's maybe not such a need to hide or hide from this departure from "this one idea of who I am" which is really just a mess of a complex of ideas.
> what uses do you anticipate for a more [fluid? porous? plural?] self-regard?
For one: More, and healthier, exposure to alternatives. Your identity is almost synonymous with your subjective past. To that degree, you're screwed in a lot of ways. To give a personal example, I was born into a cult. I was screwed from birth, in that way.
One of the best tools I had in removing myself from that environment was the concept of an "online identity" which could be moderated, intentionally, into whatever it needed to be to help me explore alternative perceptions of what it was I was involved in. I could even try on a non-cult identity, and write, online, from the perspective of someone who had freed themselves. And then I could consider how that felt, and reflect on what I learned. Did it kill me? No. Am I in hell now? Nope. etc.
Consider the millions of various points of identity just like that. Not just cults, no way! Am I Coke or Pepsi? eh, boring. Am I...which race am I? Is that a tricky question in the future? And from the outside, will I get better treatment from medical professionals if I can moderate my physical presentation at will? Wow so many random questions that can be asked for learning's sake.
But again, to emphasize--I love and respect the unknown. I don't have answers, only openness where I don't want to have certainty anymore, because that c-word makes it a little too hard to solve big problems, or a little too easy to avoid them. Don't leave the cult man, you'll lose all your certainty.
In the right circumstances, the NFL would spend $1E7-$1E8 on this or similar tech. It’s wild to think about how much of what we see on screens in a decade or so will be “computationally inferred”.
I'm sure there are also people who would love to computationally infer the preferred ending to an NFL game of their choice, too. Or change the ending to a movie of which they can only tolerate the first hour.
It would enable some really cool ideation and modeling, maybe even some of which could be used for psychology work, or sports psychology in the case of the NFL (I'm reminded of those "imagine yourself winning" tricks)
If you have a limited number of images of the same scene, with NeRF you can generate new images from different positions and angles (novel view synthesis).
But this only works with rigid scenes: e.g. if you apply NeRF to images of a person, they cannot move between the pictures.
This is what HyperNeRF is trying to solve. If there are pictures of a person, and in one of them they are smiling but on another not, 1. this method will not fail, and 2. looks like it will give reasonable new views/images.
For those of you who may not get this reference, it's from a popular Youtube channel, Two Minute Papers, run by Dr Károly Zsolnai-Fehér
that recently featured OP's link https://www.youtube.com/c/K%C3%A1rolyZsolnai/videos
The channel is excellent and I recommend subscribing to it if you like this kind of stuff.
In my opinion, everything nerf related gets a lot of opinion because it's highly graphical and thus easy to present. But there's few practical applications and it tends to be super slow and not work for more challenging scenes where traditional 20-year old methods like global penalty block matching still work reasonably.
And for this paper in particular, I fail to see how they improve over other nerf approaches with deformation terms like Nerfies or D-Nerf
Deformation fields would struggle to fundamentally change the topological type, particularly where the transformation would need to “tear” the manifold, such as turning a sphere into a donut or dividing a cell into two child cells. HyperNeRF’s exploit and extend Level Set Methods, which are rooted in Morse Theory.
The team photos double as the demo, that's neat. (Mouse over to see the depth colouring.)
I presume something along these lines will make it's way to the Pixel 6 camera software, given the origin of the research and the onboard edgeTPU block.
https://arxiv.org/abs/2112.05131 did such work and found you can get NeRF quality without any neural network at all and as a result 100x faster.
The reasons for using deep learning function approximators are manifold, for instance in RL, state and state-action spaces quickly become too large for tabular methods. Using grids or tables also basically closes off the opportunities for exploiting meta-learning and analysis-by-synthesis.
Plenoxels also rely on explicit specification of a known grid structure, where the HyperNeRF method can learn latent parametric manifolds and handle dynamic objects with changing topologies.
Extending plenoxels to support dynamic objects would be great future work.
This is a great point.
Does anybody know of any papers that explore non-neural text to speech / voice synthesis that achieve better than parametric quality?
[1] https://github.com/facebookresearch/DONERF
Deleted Comment
By the way, in the plenoxel video they say "the key component is the differentiable volumetric rendering, not the neural network".
After watching the fractal community struggle to come up with good distance estimators for years, this makes so much sense. In the end, automatic differentiation came out to be one of the most solid methods for coming up with distance estimators for an arbitrary fractal formula.
> Original one: "Nerf, Representing scenes as neural radiance fields for view synthesis"
> https://scholar.google.com/scholar?cites=9378169911033868166...
What I find most exciting about it is that a NeRF represents images as neural nets, one neural net for each image (in the OP paper generalised to image + deformations). By evaluating the net at various pixel coordinates it gives the color.
Up until now learning to replicate the input exactly was called overfitting and considered a bug, not a feature, but they showed a completely new way to wield neural nets.
An interesting detail is that they depend on Fourier encoding for the input coordinates. A variant called SIREN uses `sin` as activation function throughout the net.
Maybe neural nets will become the data compressors of tomorrow? Shoot a picture, send a neural net around. Game assets could be NeRFs.
Long a topic of research, with many interesting ramifications: https://ai.googleblog.com/2016/09/image-compression-with-neu... In analogy to standard compression techniques, you can think of the neural net as a very large "dictionary".
For text compression, there is of course the famous Hutter Prize, launched in 2006: https://en.wikipedia.org/wiki/Hutter_Prize ("Prediction is the golden key that opens all locks". Compressing each byte of wikipedia text is equivalent to predicting it-- to compactly represent its knowledge is to understand it.)
To avoid these sort of "joke" decompressors, they evaluate compressors on (size of compressed data) + (size of decompressor) in the compression competitions last time I checked. That means we won't get a winner based on GPT3 anytime soon. 350+ GB of weights is a lot to overcome :)
Though of course, given enough data to compress, it might well be that full-on neural language models are still worth it.
Well, for that particular ting there was a predecessor of sort in Deep Image Priors from 2017.
https://arxiv.org/abs/1711.10925
That was all about overfitting a neural net on a single image, which they used to get impressive inpainting, noise removal and superresolution results without any training at all (though of course it did not beat state of the art training-based approaches, even then.)
I had a lot of fun playing around with it when it came. The idea is dead simple, within reach to implement yourself with no complex mathematical understanding.
I actually had a similar idea and used it to write a lossy image compressor for my bachelor thesis in 2018: https://theses.liacs.nl/pdf/2018-2019-PetersO.pdf
It's awe-inspiring and even frightening at first, in the usual ways, but IMO it has a lot of long-term promise in other ways.
Spitballing: I like that this kind of result, which clearly calls into question the role or perception of physical identity, may eventually inform (or even necessitate?) the deconstruction of the physical "I" as a permission broker, and further open a many-to-many interface between the dimensions that underlay what we now think of as "self" and the true depth and variety within what we now think of as "individual humans who are not me". That opening process alone ought to be a huge jump for human development.
Right now we're each held, and holding ourselves, way too responsible for maintaining a singular subjective identity, looking at the aggregate. Not only does this compromise our outlook on others based on our subjective perception of the identity match, but it also compromises our ability to reliably consume and metabolize identity-construct-breaking information and experiences. And many of those things, when consumed without so many identity borders--so to speak--will end up being incredibly useful for individuals and group both.
Thanks for sharing op.
what uses do you anticipate for a more [fluid? porous? plural?] self-regard?
There is no reason now that Tom Cruise even needs to exist as a real person, or needs to ever act in a movie ever again. Tom Cruise can just become an abstract concept, no longer a living object. Perhaps it is Tom Cruise himself in these videos. Perhaps the real Tom Cruise no longer exists. Perhaps the whole thing is an elaborate art project. Our certainty of its falsity is tied solely to whether we believe the story of those who claim to have created the videos. Is it easier to create fake videos of Tom Cruise or to create real videos of Tom Cruise and a fake story?
What is a Tom Cruise?
[1] https://www.tiktok.com/@deeptomcruise
So with that said, some ideas could be started around topics like 1) massive identity theft causing a re-thinking of identity 2) creativity and constraints around the moderation of physical identity and 3) technical-presentational dynamics surrounding physical presence and the moderation of identity presentation in a physical presence context.
Any one of these is a great setup for the question: How do we interpret personal identity?
And this--again just IMO--would be an amazing point at which to say, "look, if the only word-tool we can use is 'identity' to describe this crisis/opportunity, then maybe all we have is a metaphorical hammer and we have all these endless annoying nails--in the form of identity questions--to hammer down. But if we had maybe some other word-tools to use instead of 'identity', maybe this really would look more like an opportunity to move humanity one more step up the evolutionary ladder."
We already moderate our identity every day, either consciously or unconsciously. It's been studied for thousands of years. It's in books you've read, movies you've watched. It's been done for fun, for comic relief, and also it's been done to solve mind-shattering problems. But now we start to really unwind this question of physical identity, the one concrete thing we thought was so much more certain...! and things get _really_ interesting. This is a different level, where there's maybe not such a need to hide or hide from this departure from "this one idea of who I am" which is really just a mess of a complex of ideas.
> what uses do you anticipate for a more [fluid? porous? plural?] self-regard?
For one: More, and healthier, exposure to alternatives. Your identity is almost synonymous with your subjective past. To that degree, you're screwed in a lot of ways. To give a personal example, I was born into a cult. I was screwed from birth, in that way.
One of the best tools I had in removing myself from that environment was the concept of an "online identity" which could be moderated, intentionally, into whatever it needed to be to help me explore alternative perceptions of what it was I was involved in. I could even try on a non-cult identity, and write, online, from the perspective of someone who had freed themselves. And then I could consider how that felt, and reflect on what I learned. Did it kill me? No. Am I in hell now? Nope. etc.
Consider the millions of various points of identity just like that. Not just cults, no way! Am I Coke or Pepsi? eh, boring. Am I...which race am I? Is that a tricky question in the future? And from the outside, will I get better treatment from medical professionals if I can moderate my physical presentation at will? Wow so many random questions that can be asked for learning's sake.
But again, to emphasize--I love and respect the unknown. I don't have answers, only openness where I don't want to have certainty anymore, because that c-word makes it a little too hard to solve big problems, or a little too easy to avoid them. Don't leave the cult man, you'll lose all your certainty.
Hope that helps, a little, or examples, a little.
It would enable some really cool ideation and modeling, maybe even some of which could be used for psychology work, or sports psychology in the case of the NFL (I'm reminded of those "imagine yourself winning" tricks)
Could someone help me outline what's most interesting here? Maybe applications?
But this only works with rigid scenes: e.g. if you apply NeRF to images of a person, they cannot move between the pictures.
This is what HyperNeRF is trying to solve. If there are pictures of a person, and in one of them they are smiling but on another not, 1. this method will not fail, and 2. looks like it will give reasonable new views/images.
The channel is excellent and I recommend subscribing to it if you like this kind of stuff.
And for this paper in particular, I fail to see how they improve over other nerf approaches with deformation terms like Nerfies or D-Nerf
https://en.wikipedia.org/wiki/Level-set_method
https://en.wikipedia.org/wiki/Morse_theory
I presume something along these lines will make it's way to the Pixel 6 camera software, given the origin of the research and the onboard edgeTPU block.