Gaussian Point Splatting

Posted by ibobev 6 days ago

19471Original

Comments

Comment by keyle 6 days ago

It will be interesting to see the first AAA game that uses these methods instead of rendering a 3D world. Even if made from CGI worlds, it would be a very interesting approach and with somewhat predictable performances.

Reminds me of Ecstatica [1], a 1994 game that had intense visuals with a very odd/different rendering engine made of 3D ellipsoids; in a way really crude splats in gouraud shading.

[1] https://ecstatica.fandom.com/wiki/Ecstatica

Comment by dagmx 5 days ago

I know this comes up a lot on HN because its not primarily a graphics community but:

1. Gaussian Splats are very expensive to render. They capture a lot of detail which makes them seem cheaper than an equivalent raster render of that quality, but they wouldn't meet real time AAA game performance requirements

2. Gaussian Splats don't have a concrete surface. Want to cast shadows or do physics? It's doable but very tricky. Want to relight them? Also tricky. What is the exact surface point that you want to affect or sample for any particular operation? Deformations also become very difficult to do well.

3. Gaussian Splats are not sharp. You can get sharper with different kernel types or higher density of points, but your costs go up as well.

4. Gaussian splats are awful for any kind of path tracing. You can do it but you go back to the issues above. So mixing and matching traditional content with splats becomes a performance bottleneck.

I don't think you'll see a AAA game use splats for more than something like cinematics in the near term.

Comment by Brusco_RF 5 days ago

I'm working on the vision component of a drone racing stack. Could I use GS to render my living room as a digital playground to train my vision models in?

I know nothing about the technology but the alternative is creating a 3d model of my living room which is also outside my skill-set.

Comment by KaiserPro 5 days ago

> Could I use GS to render my living room as a digital playground to train my vision models in?

Yes, its what the autonomous car people are doing.

However you might want to do photogrammetry first (https://github.com/alicevision/meshroom opensource) as that produces a mesh which you can use to detect collisions easier. The downside is that transparent objects render really badly. but it is a lot faster to render

Comment by namibj 5 days ago

Ehhh if it's just for looking and you don't have anything lidar just go for splats they're way better behaved, mostly because they don't need to understand a concept of "surface" they just understand "splat with spherical harmonics of view-dependant color".

Comment by KaiserPro 5 days ago

true, for visual only stuff they do work really well.

Comment by dagmx 5 days ago

Splats would work and it’s what a lot of automation folks use. But the risk is that your splats aren’t tight to the surface and that can cause false positives.

For training you can do a hybrid geometry plus splats workflow. Have geometry that you can constantly raycast against and have as an input to your vision training or to get accurate depth buffers.

The workflow for splats and photogrammetry are very similar.

Comment by cyber_kinetist 6 days ago

Note that the first published work of rendering Gaussian Volumes was in this 1991 paper (https://articles.tomasparks.name/publications/Westover1991.p...) - so 3DGS is really a rehash of an old method from the 90s!

The contributions of 3DGS lie in how fast you can make them in modern GPU hardware (tiling + sorting with threads), and how to make the pipeline differentiable so that you can fit the Gaussian splats with photogrammetry data. Similar to the history of deep learning, it became technically feasible once the GPU hardware was powerful enough.

Comment by sqrt_1 6 days ago

There was this FPS demo recently https://playcanv.as/p/qxGSuzYq/

People have also converted some small sections of Unreal 5 demos into splats https://superspl.at/scene/692c4f91

Or perhaps use a real world scan - it was suggested this one would make an ideal setting for zombies https://superspl.at/scene/6359774f

Comment by monkpit 5 days ago

Can someone explain the unreal demo linked here - is the reflection in the street also using splatting, or is it something else?

Comment by speps 5 days ago

Yes splats do reflections

Comment by cyber_kinetist 5 days ago

Yes but in a weird way (at least on vanilla 3DGS) - it duplicates the splats on the mirror side.

Vanilla 3DGS cannot do any specular lighting or reflections - the color is basically baked in the splats. There's some active research going on to create richer Gaussian splats so we can do shading (or even ray tracing on it) - but haven't seen anyone using in production yet.

Comment by monkpit 5 days ago

Is that weird, though? I thought that was a pretty common way to do reflections, but I’m not that well-versed in the latest tech in this space.

Comment by grumbel 6 days ago

Many years ago there was a game called Casebook[1], a small little detective game where you investigated rooms for clues. But unlike similar FMV games where you jumped from point to point, it had photorealistic environments that could be smoothly walk around in, much like later lightfield or gaussian splatting experiments.

[1] https://www.youtube.com/watch?v=o-VAaC5BgVE

Comment by cubefox 6 days ago

Any idea on how they achieved this?

Comment by Cieric 6 days ago

I can't say I know how they actually did it, but taking a look at the trailer I can point out that it looks like the spaces are confined and your character is on rails. I'm mainly going off of the instant direction changes that don't appear to be 45 degrees off from the camera direction. Once it's constrained down to a single line/path you could do some wild things like cube mapping a video, where the position in the video is tied to the characters position. I can't say I know how they would take that video though, my best guess there is the scenes are constructed in 3d software, just it was to expensive for real time rendering.

Comment by cubefox 6 days ago

Cube mapping a video sounds plausible, this is commonly known as 360° video. Putting the camera on rails (though I don't really notice rails in this case) and tying the video playback speed to the speed of the rail movement has also been done in the past in some pre-rendered PlayStation games, though without cube mapping. But I think it's not pre-rendered in this case. It looks far too realistic for a game that is at least 17 years old. My best guess: they captured the 360 degree videos with a real camera (stabilized in some way) and edited the equipment out frame by frame.

Comment by grumbel 5 days ago

The image capture was done with a robotic camera rig from what I understand, they photographed 360° images of the room from all possible position. They restricted the camera movement to a plane, which is why the player height is fixed. I don't know what they did on the software side with all the image.

Comment by cubefox 5 days ago

Oh cool, so the camera wasn't just on 1D "rails", it was on a 2D plane. I never before heard of a game (pre-rendered or photographed) which did that. Impressive.

Comment by selimthegrim 5 days ago

This reminds me of Titanic: Adventure Out of Time

Comment by modeless 6 days ago

Dreams for PS4 used point splatting and has a very unique look as a result. The splats were created from distance fields instead of being scanned, so they don't look like modern gaussian splats. They have a painterly look instead. https://youtu.be/2ltgkcoQzow

Comment by avaer 6 days ago

This is "rendering a 3D world". It's basically the exact same techniques that traditional rendering uses, just with a different primitive that's not triangles. Everything else pretty much carries over.

If you mean the technique of splatting specifically, Dreams for PS4 [1] is prior art.

If you mean pre-rendering, there's Myst and games like the original FF7 for PS1.

[1] https://en.wikipedia.org/wiki/Dreams_(video_game)

Comment by accrual 5 days ago

It's not gaussian splatting, but Outcast (1999) has an interesting voxel-like rendering for the world surface. It has a pretty distinct feeling when walking around in the early areas, and a somewhat clunky but usable UI.

> The game does not actually model three-dimensional volumes of voxels. Instead, it models the ground as a surface, which may be seen as being made up of voxels. The ground is decorated with objects that are modeled using texture-mapped polygons. When Outcast was developed, the term "voxel engine", when applied to video games, commonly referred to a ray casting engine (for example the Voxel Space engine). On the engine technology page of the game's website, the landscape engine is also referred to as the "Voxels engine". The engine is purely software based; it does not rely on hardware-acceleration via a 3D graphics card.

https://en.wikipedia.org/wiki/Outcast_(video_game)

Comment by jayd16 6 days ago

Bladerunner: Revelations used a similar technique to bake down large CGI worlds with expensive lighting into something that ran on a Pixel 1 at VR specs.

Its honestly really very hard to work with this stuff because you ultimately need to be able to meshes inside these scenes triangle seas and you need to do it in a way that plausibly fits in the world. You can't have unlit characters walking around a baked lit scene and have them fit in. That's just from a visual design perspective.

You also always want to have bounce light from your dynamic things onto the baked scene and depending on the tech, you might not even be able to spatially place a dynamic thing and have it properly occlude what splats it needs to occlude.

As is, its a niche technology for games. That might change one day.

https://github.com/googlevr/seurat https://www.youtube.com/watch?v=Pf5Q3bvXj8E

Comment by jamwise 5 days ago

I think it's inevitable it goes there. Right now the level of detail and quality of games is limited by the console/PC hardware you're playing on. But with the splats they can render the whole game's world in a massive server farm at Hollywood Movie quality. I imagine there might be some balance of splat and traditional rendering technology since not all objects will lend themselves well, but this might be truly transformative.

Comment by dagmx 5 days ago

Why would you limit one to your local hardware and one to a cloud infrastructure?

Both can be done locally or on cloud? the comparison point becomes moot if you change the parameters that drastically

Comment by jamwise 5 days ago

There wouldn't be any cloud. Splats are still local, but all the lighting and texture are pre-rendered. The problem is they're not interactive, so they'd be good for a lot of the environment but your main character and other things that need to be interactive would need to use a different approach.

Comment by boppo1 6 days ago

I really wantt to get into splatting and I have the tools: good camera, v comfy in blender, comfy with graphics programming ideas, 4080. But I haven't found a good 'all in one intro' to it yet. Possibly because I'm foss-biased and have dismissed proprietary options. But does anyone know of a good 'vertical tutorial' on this stuff?

Comment by Yen 6 days ago

I recently got into splatting. I looked for some good all-in-one tutorials, but didn't find any, and mostly muddled through through trial and error and LLM assistance. I present this workflow as a straight-line pipeline, though in practice it took a lot of iteration and backtracking and rework to get the final result. Here's what worked for me:

I captured a video on a smartphone camera, using the OpenCamera app. Specifically, this video was captured with exposure locked, framerate locked, focus locked, fairly high framerate and resolution. I walked slowly and carefully around an outdoor scene, trying to get fairly good coverage from multiple angles. I took roughly 20 minutes of video, weighing 19GB.

This video was sampled into individual image frames at about 5fps using ffmpeg. There's room for experimentation and improvement here, an adaptive, coverage-aware sampling strategy would be better. But fixed 5fps was Good Enough (tm). This resulted in roughly 8,000 images at 4k. This was a pretty hefty dataset for my limited 1080, but I made it work.

I then generated masks for these images, to ignore transient objects during the splat training. (i.e. to cut out people who transiently walked through the scene). For this I used Cutie (https://github.com/hkchengrex/Cutie). For outdoor scenes, it can also make sense to mask out low-parallax areas like faraway mountains or especially the sky, as these are difficult to train correctly. If masks are generated for some images, you'll need at least placeholder masks for the all of them. In the end I've got about 8,000 PNGs that are monochrome black/white masks.

Then the images are handed to COLMAP (https://github.com/colmap/colmap), using the 'global mapper' option. This registers the camera positions in 3D space, and generates a crude point cloud that's good for sanity-checking. This step required a fair bit of iteration to get right. The full reconstructed output from COLMAP is not necessary, only the pose-estimate .bin files. The output directory here was about 500MB for this step for me.

With COLMAP registration done, the next step is the actual training. I found two useful pieces of software for this, with different tradeoffs.

Brush (https://github.com/ArthurBrussee/brush). Was very straightforward to install and use, requiring very little in external dependencies and setup. It was also pretty speedy on training, and gave good results. Minor modifications to the training process were possible by editing source, though I didn't get too wild here. Brush takes the *.bin files from COLMAP, plus the original images directory, and the masks directory if it exists. Run on its own, this could produce gaussian splat .ply files, 500-800MB in size, containing 1-10M splats. More than that and my poor little 8GB of VRAM OOM'd.

nerfstudio (https://github.com/nerfstudio-project/nerfstudio) Was also useful, as many research papers get implemented in its framework. In particular, for this outdoor scene, I used wild-gaussians (https://github.com/jkulhanek/wild-gaussians/) to generate just a sky sphere (to help seed low-parallax areas in my particular dataset), stopped training, and used this as an init.ply to pass to brush.

I then set up a very simple viewer website, using SuperSplat (https://github.com/playcanvas/supersplat). I used supersplat's editor to align the splat's coordinate system with the rotation and scaling that I wanted, and then exported an optimized .sog file, roughly 1/10th the size. .sog is nominally open-standards, though I'm not aware of any other projects using the format. This gave fairly good framerates and adequate controls across a variety of platforms.

As a little bit extra, supersplat's splat-transform CLI tool was used to generate a crude collision mesh for the scene, enabling a walking mode that respected object boundaries.

If there's interest I can post my results, I got a bit sidetracked with other projects and other splats, and this particular one I got fiddling with some more cleanup. I can get it up with a few more hours work. But hopefully that's a good start, all of these are fully FOSS, and resulted in a good-looking splat.

Comment by boppo1 5 days ago

Awesome, thank you! this is a good starting point!

Comment by ireadmevs 5 days ago

Thank you for sharing!

Comment by dimitri-vs 6 days ago

Maybe not exactly the kind of tutorial you're looking for but very enjoyable none the less: https://youtu.be/eekCQQYwlgA

Comment by Epitaque 6 days ago

Did not read the paper (sorry) but I wonder how this compares to mesh splatting (https://meshsplatting.github.io/). I feel like mesh splatting can produce higher quality results because triangles are very good at representing sharp features, and gaussians aren't.

Comment by dpark 6 days ago

But only in the same sense that triangles are bad at representing curves, right? It seems that’s a wash.

Comment by phrotoma 6 days ago

I love this site design. It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side, with a font for my old man eyes.

Comment by zokier 6 days ago

> It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side

Umm on my machine it has 560px margin on both sides with the content being only 474px sliver in the middle?

Comment by simonklitj 6 days ago

Imo they need to pad it just a bit. My scrollbar overlaps.

Comment by docheinestages 6 days ago

Maybe use Tampermonkey?

Comment by HexDecOctBin 6 days ago

Can someone point to a resource/tutorial for learning point splatting (the 90s rendering technique)? Gaussian Splatting has completely over taken the search results, and the original technique is now near impossible to find.

Comment by jasonjmcghee 6 days ago

Westover’s thesis https://www.cs.unc.edu/techreports/91-029.pdf

Comment by cubefox 6 days ago

It's going to be even more impossible to find now because the present paper introduces "Gaussian point splatting".

Comment by cubefox 6 days ago

Their point splatting method is orthogonal to level-of-detail rendering (they reference a few papers which try to do this), so both point splatting and LoD could be combined in the future for an even greater performance gain during rendering. They already implement occlusion and frustum culling.

Point splatting does introduce a lot of noise though, and their denoiser introduces ghosting, but they say a more sophisticated denoiser would give considerably better quality.

Comment by samch 6 days ago

It seems like there are fairly regular posts on HN about splatting, and most appear to be fairly technical or proof-of-concept level. While the outputs look nice, I’m not sure that I could distinguish them from a nice ray-traced scene. What I think I’m missing is the “why?” of splatting. What are the material benefits of this area of research?

Comment by jerf 6 days ago

At the moment, combining your statement "I’m not sure that I could distinguish them from a nice ray-traced scene" and adding "your graphics card can move through them in real time so cheaply that it can easily be used as a component in other tech even at high frame rates" covers it pretty nicely. There's some research into how to make them move or do other things they don't do very well, but the fact that you can swoop through them in real time on cell-phone level of power means they fit a lot of niches. Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.

Personally I suspect they are getting a bit more attention then they "deserve"; people aren't talking about their weaknesses very much and I think that's resulting in some overexcitement. Some of the "we can replace everything with splats!" reminds me of the people who still don't understand why "if GPUs are thousands of times faster than CPUs why don't we run everything on GPUs?" is basically not even a sensible question. I don't see them as ever being the foundation of a graphics stack, but they definitely have a place as part of a well-rounded menu of techniques that can be brought to bear on a wide range of problems.

Comment by zokier 6 days ago

> Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.

This is the big thing imho. Sure, you can do traditional photogrammetry to capture meshes and textures but getting the shaders exactly right is afaik non-trivial etc, and if you want real-time rendering then you likely need some further post-processing of the assets. With 3dgs you can pretty much bypass all that complexity and the whole pipeline from photos to rendered frame is much more straightforward.

Comment by djmips 6 days ago

Could this be a new direction for Google Streetview perhaps?

Comment by sorenjan 6 days ago

When looking at their linked interactive viewer it looks like they need 128 spp for the image quality to equal 3dgs. Maybe you can reduce that with some temporal tricks and noise reduction filtering, but that's still a lot of samples.

Comment by andybak 6 days ago

People are rendering huge splat scenes on mobile devices using LOD. This (currently) requires CUDA and an NVidia GPU to work. I would have been much more impressed to see a demo where it was running on low end mobile hardware faster than current splat renderers can.

I'm probably being a bit of a grinch about it but the abstract doesn't address performance or hardware constraints either so I guess I'm going to have to read the damn paper.

Comment by cyber_kinetist 6 days ago

Really nice idea for 3DGS rendering - though the main problem is the noise (an unfortunate issue for all Monte-Carlo based methods).

I think future papers would probably continue improving on this method and focus on how to sample the points more efficiently while being unbiased (similar to how ray-tracing solved their performance issues). Or maybe... we can just add a deep-learning based denoiser and call it a day!

Comment by lucamark 6 days ago

This feels like Monte Carlo rendering applied to rasterization. I'm wondering if it's a brand-new or a well established methodology

Comment by pixelesque 6 days ago

It's not new - that was sort of my point with my other comment.

At least if it's progressive (so refines and resolves over time), this has been done with pointclouds in the VFX industry in GPU shaders for years in terms of stochastically drawing different points so eventually the whole point set gets rasterised to a fidelity threshold.

Comment by lucamark 6 days ago

ookay, thanks for the clarification! So, the interesting part here seems to be the 3DGS-specific opacity correction and GPU workload mapping. Am I wrong?

Comment by pixelesque 6 days ago

Possibly yeah.

Or the per-pixel coord atomic I guess?

Comment by lucamark 6 days ago

Right, that part seems to be based on Schütz et al. 2021 https://arxiv.org/abs/2104.07526

Comment by avaer 6 days ago

Monte Carlo in 3dgs is established enough that Spark [1] has been doing it for a while in the browser.

https://github.com/sparkjsdev/spark

Comment by cyber_kinetist 6 days ago

Cannot find anything related to Monte Carlo methods in the source code. I thought Spark implemented a conventional 3DGS pipeline with LoD optimizations (And it seems they do the sorting on the CPU using Rust/WebAssembly because of WebGL limitations)

Comment by convolvatron 6 days ago

that goes all the way back to the Kajiya rendering equation https://en.wikipedia.org/wiki/Rendering_equation

Comment by bnolsen 5 days ago

I was doing this for remote sensing orthorectification work back in 2004/2005. It works very well across multiple types of imaging sensors.

Comment by MattCruikshank 6 days ago

My dumb idea... do outdoor scans, and then convert the contents into 1m^2 blocks... And then, just dumbly stitch them together.

Kind of like Minecraft... but with user-generated gaussian-splat blocks.

Comment by jamilton 5 days ago

1m^3, right? I can picture what you mean, but I'm not sure it works technically, since I think the splats for a given region are not actually bound to the region they represent. Like, for example, reflections work by having the reflection being physically behind the reflective surface. And they're all transparent, so it'd blend together.

Comment by MattCruikshank 5 days ago

Sure, you could think in terms of 1m^3.

Yes, you're right that composing the best picture for an eye point could (and does) use splats from all over the scene.

But I think if you limit to splats that are (entirely, mostly, partially?) inside the 1m^3 block, you'll do pretty well. And you're absolutely right that reflective surfaces would probably be the first to suffer.

Well, it's worse than that. Because if you make a 1m^3 pond cube, and then I go putting trees around it, a naive rendering would still show YOUR reflections in the pond, rather than rendering from that pond's point of view, etc, like traditional rendering.

One of Gaussian Splats strengths, that it doesn't care... becomes a problem for me.

Comment by praveen9920 6 days ago

Sorting the gaussians is the compute heavy part in gaussian splatting. So, Im guessing this will give only marginal improvement in terms rendering speed.

Comment by xyzsparetimexyz 6 days ago

I'm not sure it does a sort. Each group of threads only handles a select number of gaussians

Comment by zokier 6 days ago

Yea, I think avoiding sorting is kinda the whole point here

Comment by pixelesque 6 days ago

> millions of threads

Really?! What OSs can handle that many native threads?

Also, this seems quite similar to stochastic progressive drawing of pointclouds for realtime that has been done for > 15 years in the VFX industry with GPU shaders in a tiled/bucketed fashion, unless this isn't progressive maybe? (The fact it's been accepted for Siggraph likely indicates it's slightly different).

Comment by Calavar 6 days ago

I believe they mean GPU threads. Plenty of cuda files in their repository.

Comment by pixelesque 6 days ago

Fair enough, but that's then only absolutely max 1024 threads per SM, which wouldn't get anywhere near 1 million, given 5090 only has 192 SMs...

Future proofing I guess...

Comment by cyber_kinetist 6 days ago

You can launch much more logical threads than the available physical threads. The GPU scheduler will automatically dispatch the work to the SMs.

Comment by ks6g10 6 days ago

Just like 2 threads can execute on the same core at the "same" time, i.e. no synchronization, the same is true for GPU threads/ thread groups.

Comment by zipy124 6 days ago

I guess they never say that they execute at the same time technically haha

Comment by DamnInteresting 6 days ago

Video overview of the technology: https://www.youtube.com/watch?v=X8yRlA7jqEQ

Ordinarily I don't prefer video, but the visuals are helpful here.

Also, an online interactive, but it seems to only work in Chrome: https://superspl.at/scene/ff1d0393