Gaussian Point Splatting
Posted by ibobev 6 days ago
Comments
Comment by keyle 6 days ago
Reminds me of Ecstatica [1], a 1994 game that had intense visuals with a very odd/different rendering engine made of 3D ellipsoids; in a way really crude splats in gouraud shading.
Comment by dagmx 5 days ago
1. Gaussian Splats are very expensive to render. They capture a lot of detail which makes them seem cheaper than an equivalent raster render of that quality, but they wouldn't meet real time AAA game performance requirements
2. Gaussian Splats don't have a concrete surface. Want to cast shadows or do physics? It's doable but very tricky. Want to relight them? Also tricky. What is the exact surface point that you want to affect or sample for any particular operation? Deformations also become very difficult to do well.
3. Gaussian Splats are not sharp. You can get sharper with different kernel types or higher density of points, but your costs go up as well.
4. Gaussian splats are awful for any kind of path tracing. You can do it but you go back to the issues above. So mixing and matching traditional content with splats becomes a performance bottleneck.
I don't think you'll see a AAA game use splats for more than something like cinematics in the near term.
Comment by Brusco_RF 5 days ago
I know nothing about the technology but the alternative is creating a 3d model of my living room which is also outside my skill-set.
Comment by KaiserPro 5 days ago
Yes, its what the autonomous car people are doing.
However you might want to do photogrammetry first (https://github.com/alicevision/meshroom opensource) as that produces a mesh which you can use to detect collisions easier. The downside is that transparent objects render really badly. but it is a lot faster to render
Comment by namibj 5 days ago
Comment by KaiserPro 5 days ago
Comment by dagmx 5 days ago
For training you can do a hybrid geometry plus splats workflow. Have geometry that you can constantly raycast against and have as an input to your vision training or to get accurate depth buffers.
The workflow for splats and photogrammetry are very similar.
Comment by cyber_kinetist 6 days ago
The contributions of 3DGS lie in how fast you can make them in modern GPU hardware (tiling + sorting with threads), and how to make the pipeline differentiable so that you can fit the Gaussian splats with photogrammetry data. Similar to the history of deep learning, it became technically feasible once the GPU hardware was powerful enough.
Comment by sqrt_1 6 days ago
People have also converted some small sections of Unreal 5 demos into splats https://superspl.at/scene/692c4f91
Or perhaps use a real world scan - it was suggested this one would make an ideal setting for zombies https://superspl.at/scene/6359774f
Comment by monkpit 5 days ago
Comment by speps 5 days ago
Comment by cyber_kinetist 5 days ago
Vanilla 3DGS cannot do any specular lighting or reflections - the color is basically baked in the splats. There's some active research going on to create richer Gaussian splats so we can do shading (or even ray tracing on it) - but haven't seen anyone using in production yet.
Comment by monkpit 5 days ago
Comment by grumbel 6 days ago
Comment by cubefox 6 days ago
Comment by Cieric 6 days ago
Comment by cubefox 6 days ago
Comment by grumbel 5 days ago
Comment by cubefox 5 days ago
Comment by selimthegrim 5 days ago
Comment by modeless 6 days ago
Comment by avaer 6 days ago
If you mean the technique of splatting specifically, Dreams for PS4 [1] is prior art.
If you mean pre-rendering, there's Myst and games like the original FF7 for PS1.
Comment by accrual 5 days ago
> The game does not actually model three-dimensional volumes of voxels. Instead, it models the ground as a surface, which may be seen as being made up of voxels. The ground is decorated with objects that are modeled using texture-mapped polygons. When Outcast was developed, the term "voxel engine", when applied to video games, commonly referred to a ray casting engine (for example the Voxel Space engine). On the engine technology page of the game's website, the landscape engine is also referred to as the "Voxels engine". The engine is purely software based; it does not rely on hardware-acceleration via a 3D graphics card.
Comment by jayd16 6 days ago
Its honestly really very hard to work with this stuff because you ultimately need to be able to meshes inside these scenes triangle seas and you need to do it in a way that plausibly fits in the world. You can't have unlit characters walking around a baked lit scene and have them fit in. That's just from a visual design perspective.
You also always want to have bounce light from your dynamic things onto the baked scene and depending on the tech, you might not even be able to spatially place a dynamic thing and have it properly occlude what splats it needs to occlude.
As is, its a niche technology for games. That might change one day.
https://github.com/googlevr/seurat https://www.youtube.com/watch?v=Pf5Q3bvXj8E
Comment by jamwise 5 days ago
Comment by dagmx 5 days ago
Both can be done locally or on cloud? the comparison point becomes moot if you change the parameters that drastically
Comment by jamwise 5 days ago
Comment by boppo1 6 days ago
Comment by Yen 6 days ago
I captured a video on a smartphone camera, using the OpenCamera app. Specifically, this video was captured with exposure locked, framerate locked, focus locked, fairly high framerate and resolution. I walked slowly and carefully around an outdoor scene, trying to get fairly good coverage from multiple angles. I took roughly 20 minutes of video, weighing 19GB.
This video was sampled into individual image frames at about 5fps using ffmpeg. There's room for experimentation and improvement here, an adaptive, coverage-aware sampling strategy would be better. But fixed 5fps was Good Enough (tm). This resulted in roughly 8,000 images at 4k. This was a pretty hefty dataset for my limited 1080, but I made it work.
I then generated masks for these images, to ignore transient objects during the splat training. (i.e. to cut out people who transiently walked through the scene). For this I used Cutie (https://github.com/hkchengrex/Cutie). For outdoor scenes, it can also make sense to mask out low-parallax areas like faraway mountains or especially the sky, as these are difficult to train correctly. If masks are generated for some images, you'll need at least placeholder masks for the all of them. In the end I've got about 8,000 PNGs that are monochrome black/white masks.
Then the images are handed to COLMAP (https://github.com/colmap/colmap), using the 'global mapper' option. This registers the camera positions in 3D space, and generates a crude point cloud that's good for sanity-checking. This step required a fair bit of iteration to get right. The full reconstructed output from COLMAP is not necessary, only the pose-estimate .bin files. The output directory here was about 500MB for this step for me.
With COLMAP registration done, the next step is the actual training. I found two useful pieces of software for this, with different tradeoffs.
Brush (https://github.com/ArthurBrussee/brush). Was very straightforward to install and use, requiring very little in external dependencies and setup. It was also pretty speedy on training, and gave good results. Minor modifications to the training process were possible by editing source, though I didn't get too wild here. Brush takes the *.bin files from COLMAP, plus the original images directory, and the masks directory if it exists. Run on its own, this could produce gaussian splat .ply files, 500-800MB in size, containing 1-10M splats. More than that and my poor little 8GB of VRAM OOM'd.
nerfstudio (https://github.com/nerfstudio-project/nerfstudio) Was also useful, as many research papers get implemented in its framework. In particular, for this outdoor scene, I used wild-gaussians (https://github.com/jkulhanek/wild-gaussians/) to generate just a sky sphere (to help seed low-parallax areas in my particular dataset), stopped training, and used this as an init.ply to pass to brush.
I then set up a very simple viewer website, using SuperSplat (https://github.com/playcanvas/supersplat). I used supersplat's editor to align the splat's coordinate system with the rotation and scaling that I wanted, and then exported an optimized .sog file, roughly 1/10th the size. .sog is nominally open-standards, though I'm not aware of any other projects using the format. This gave fairly good framerates and adequate controls across a variety of platforms.
As a little bit extra, supersplat's splat-transform CLI tool was used to generate a crude collision mesh for the scene, enabling a walking mode that respected object boundaries.
If there's interest I can post my results, I got a bit sidetracked with other projects and other splats, and this particular one I got fiddling with some more cleanup. I can get it up with a few more hours work. But hopefully that's a good start, all of these are fully FOSS, and resulted in a good-looking splat.
Comment by dimitri-vs 6 days ago
Comment by Epitaque 6 days ago
Comment by dpark 6 days ago
Comment by phrotoma 6 days ago
<3
Comment by zokier 6 days ago
Umm on my machine it has 560px margin on both sides with the content being only 474px sliver in the middle?
Comment by simonklitj 6 days ago
Comment by docheinestages 6 days ago
Comment by HexDecOctBin 6 days ago
Comment by jasonjmcghee 6 days ago
Comment by cubefox 6 days ago
Comment by cubefox 6 days ago
Point splatting does introduce a lot of noise though, and their denoiser introduces ghosting, but they say a more sophisticated denoiser would give considerably better quality.
Comment by samch 6 days ago
Comment by jerf 6 days ago
Personally I suspect they are getting a bit more attention then they "deserve"; people aren't talking about their weaknesses very much and I think that's resulting in some overexcitement. Some of the "we can replace everything with splats!" reminds me of the people who still don't understand why "if GPUs are thousands of times faster than CPUs why don't we run everything on GPUs?" is basically not even a sensible question. I don't see them as ever being the foundation of a graphics stack, but they definitely have a place as part of a well-rounded menu of techniques that can be brought to bear on a wide range of problems.
Comment by zokier 6 days ago
This is the big thing imho. Sure, you can do traditional photogrammetry to capture meshes and textures but getting the shaders exactly right is afaik non-trivial etc, and if you want real-time rendering then you likely need some further post-processing of the assets. With 3dgs you can pretty much bypass all that complexity and the whole pipeline from photos to rendered frame is much more straightforward.
Comment by djmips 6 days ago
Comment by sorenjan 6 days ago
Comment by andybak 6 days ago
I'm probably being a bit of a grinch about it but the abstract doesn't address performance or hardware constraints either so I guess I'm going to have to read the damn paper.
Comment by cyber_kinetist 6 days ago
I think future papers would probably continue improving on this method and focus on how to sample the points more efficiently while being unbiased (similar to how ray-tracing solved their performance issues). Or maybe... we can just add a deep-learning based denoiser and call it a day!
Comment by lucamark 6 days ago
Comment by pixelesque 6 days ago
At least if it's progressive (so refines and resolves over time), this has been done with pointclouds in the VFX industry in GPU shaders for years in terms of stochastically drawing different points so eventually the whole point set gets rasterised to a fidelity threshold.
Comment by lucamark 6 days ago
Comment by pixelesque 6 days ago
Or the per-pixel coord atomic I guess?
Comment by lucamark 6 days ago
Comment by avaer 6 days ago
Comment by cyber_kinetist 6 days ago
Comment by convolvatron 6 days ago
Comment by bnolsen 5 days ago
Comment by MattCruikshank 6 days ago
Kind of like Minecraft... but with user-generated gaussian-splat blocks.
Comment by jamilton 5 days ago
Comment by MattCruikshank 5 days ago
Yes, you're right that composing the best picture for an eye point could (and does) use splats from all over the scene.
But I think if you limit to splats that are (entirely, mostly, partially?) inside the 1m^3 block, you'll do pretty well. And you're absolutely right that reflective surfaces would probably be the first to suffer.
Well, it's worse than that. Because if you make a 1m^3 pond cube, and then I go putting trees around it, a naive rendering would still show YOUR reflections in the pond, rather than rendering from that pond's point of view, etc, like traditional rendering.
One of Gaussian Splats strengths, that it doesn't care... becomes a problem for me.
Comment by praveen9920 6 days ago
Comment by xyzsparetimexyz 6 days ago
Comment by zokier 6 days ago
Comment by pixelesque 6 days ago
Really?! What OSs can handle that many native threads?
Also, this seems quite similar to stochastic progressive drawing of pointclouds for realtime that has been done for > 15 years in the VFX industry with GPU shaders in a tiled/bucketed fashion, unless this isn't progressive maybe? (The fact it's been accepted for Siggraph likely indicates it's slightly different).
Comment by Calavar 6 days ago
Comment by pixelesque 6 days ago
Future proofing I guess...
Comment by cyber_kinetist 6 days ago
Comment by ks6g10 6 days ago
Comment by zipy124 6 days ago
Comment by DamnInteresting 6 days ago
Ordinarily I don't prefer video, but the visuals are helpful here.
Also, an online interactive, but it seems to only work in Chrome: https://superspl.at/scene/ff1d0393