Hacker Newsnew | past | comments | ask | show | jobs | submit | darhodester's commentslogin

Grin Machine knocked this out of the park!

Great job, Chris and crew!


Cool aesthetic!


My wife said the same thing, but it gets better after the intro.


For sure!


Hi,

I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.

If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.

Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).


I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.


There's a ton of fun tricks you can perform with Gaussian splatting!

You're right that you can intentionally under-construct your scenes. These can create a dream-like effect.

It's also possible to stylize your Gaussian splats to produce NPR effects. Check out David Lisser's amazing work: https://davidlisser.co.uk/Surface-Tension.

Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.


Y'all did such a good job with this. It captivated HN and was the top post for the entire day, and will probably last for much of tomorrow.

If you don't know already, you need to leverage this. HN is one of the biggest channels of engineers and venture capitalists on the internet. It's almost pure signal (minus some grumpy engineer grumblings - we're a grouchy lot sometimes).

Post your contract info here. You might get business inquiries. If you've got any special software or process in what you do, there might be "venture scale" business opportunities that come your way. Certainly clients, but potentially much more.

(I'd certainly like to get in touch!)

--

edit: Since I'm commenting here, I'll expand on my thoughts. I've been rate limited all day long, and I don't know if I can post another response.

I believe volumetric is going to be huge for creative work in the coming years.

Gaussian splats are a huge improvement over point clouds and NeRFs in terms of accessibility and rendering, but the field has so many potential ways to evolve.

I was always in love with Intel's "volume", but it was impractical [1, 2] and got shut down. Their demos are still impressive, especially from an equipment POV, but A$AP Rocky's music video is technically superior.

During the pandemic, to get over my lack of in-person filmmaking, I wrote Unreal Engine shaders to combine the output of several Kinect point clouds [3] to build my own lightweight version inspired by what Intel was doing. The VGA resolution of consumer volumetric hardware was a pain and I was faced with fpga solutions for higher real time resolution, or going 100% offline.

World Labs and Apple are doing exciting work with image-to-Gaussian models [4, 5], and World Labs created the fantastic Spark library [6] for viewing them.

I've been leveraging splats to do controllable image gen and video generation [7], where they're extremely useful for consistent sets and props between shots.

I think the next steps for Gaussian splats are good editing tools, segmenting, physics, etc. The generative models are showing a lot of promise too. The Hunyuan team is supposedly working on a generative Gaussian model.

[1] https://www.youtube.com/watch?v=24Y4zby6tmo (film)

[2] https://www.youtube.com/watch?v=4NJUiBZVx5c (hardware)

[3] https://www.twitch.tv/videos/969978954?collection=02RSMb5adR...

[4] https://www.worldlabs.ai/blog/marble-world-model

[5] https://machinelearning.apple.com/research/sharp-monocular-v...

[6] https://sparkjs.dev/

[7] https://github.com/storytold/artcraft (in action: https://www.youtube.com/watch?v=iD999naQq9A or https://www.youtube.com/watch?v=f8L4_ot1bQA )


First, all credit for execution and vision of Helicopter go to A$AP, Dan Streit, and Grin Machine (https://www.linkedin.com/company/grin-machine/about/). Evercoast and Wild Capture were also involved.

Second, it's very motivating to read this! My background is in video game development (only recently transitioning to VFX). My dream is to make a Gaussian splatting content creation and game development platform with social elements. One of the most exciting aspects of Gaussian splatting is that it democratizes high quality content acquisition. Let's make casual and micro games based on the world around us and share those with our friends and communities.


Thanks darhodester! It was definitely a broad team effort that started with Rocky and Streit's creative genius which was then made possible by Evercoast's software to capture and generate all the 4D splat data (www.evercoast.com), which then flowed to the incredible people at Grin Machine and Wild capture who used GSOPs and OctaneRender.


What do you think about the sparse voxel approach, shouldn't it be more compute efficient than computing zillions of ellipsoids? My understanding of CGI prolly is t0o shallow but I wonder why it hasn't caught on much..


I believe most of the "voxel" approaches also require some type of inference (MLP). This limits the use case and ability to finely control edits. Gaussian splatting is amazing because each Gaussian is just a point in space with a rotation and non-uniform scale.

The most expensive part of Gaussian splatting is depth sorting.


The ghost effect is pretty cool, too! https://www.youtube.com/watch?v=DQGtimwfpIo


https://youtu.be/eyAVWH61R8E?t=3m53s

Superman is what comes to mind for this


I remember splatting being introduced as a way to capture real life scenes, but one of the links you have provided in this discusson seems to have used a traditional polygon mesh scene as training input for the splat model. How common is this and why would one do it that way over e.g. vertex shader effects that give the mesh a splatty aesthetic?


Yes, it's quite trivial to convert traditional CG to Gaussian splats. We can render our scenes/objects just as we would capture physical spaces. The additional benefits of using synthetic data is 100% accurate camera poses (alignment) which means the structure from motion (SfM) step can be bypassed.

It's also possible to splat from textured meshes directly, see: https://github.com/electronicarts/mesh2splat. This approach yields high quality, PBR compatible splats, but is not quite as efficient as a traditional training workflow. This approach will likely become mainstream in third party render engines, moving forward.

Why do this? 1. Consistent, streamlined visuals across a massive ecosystem, including content creation tools, the web, and XR headsets. 2. High fidelity, compressed visuals. With SOGs compression, splats are going to become the dominant 3D representation on the web (see https://superspl.at). 3. E-commerce (product visualizations, tours, real-estate, etc.) 4. Virtual production (replace green screens with giant LED walls). 5. View-dependent effects without (traditional) shaders or lighting

It's not just about the aesthetic, it's also about interoperability, ease of use, and the entire ecosystem.


From the article:

>Evercoast deployed a 56 camera RGB-D array

Do you know which depth cameras they used?


We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.


You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?

Can you add any interesting details on the benchmarking done against the RED camera rig?


This is a great question, would love some some feedback on this.

I assume they stuck with realsense for proper depth maps. However, those are both limited to a 6 meters range, and their depth imaging isn't able to resolve features smaller than their native resolution allows (gets worse after 3m too, as there is less and less parallax among other issues). I wonder how they approached that as well.



I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.

I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.


Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.

EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.


ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.

And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.


Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?


A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.

There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.

In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.


The minimum is four RGB-only cameras (if you want RGB data) but adding lidar really helps.

The standard pipeline can infer a huge amount of data, and there are a few AI tools now for hallucinating missing geometry and backfaces based on context recognition, which can then be converted back into a splat for fast, smooth rendering.


I think it's because they already had proven capture hardware, harvest, and processing workflows.

But yes, you can easily use iPhones for this now.


Looks great by the way, i was wondering if there’s a file format for volumetric video captures


Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.

Check this project, for example: https://zju3dv.github.io/freetimegs/

Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.

Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).


https://developer.apple.com/av-foundation/

https://developer.apple.com/documentation/spatial/

Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.


A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.

You just can't see the back of a thing by knowing the shape of the front side with current technologies.


Right! My terminology may be imprecise here, but I believe there is still an important distinction:

The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.

On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.

I also note that these distinctions start to vanish when dealing with video or using more than one capture device.


No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.

So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...

and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.


Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?


Why would they go for the cheapest option?


It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.


Kinect Azure


Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene? With M4 that shouldn’t be problem?


Yes: https://irrealix.com/plugin/gaussian-splatting-davinci-resol...

(I'm not the author.)

You can train your own splats using Brush or OpenSplat


Great work! I’d love to see a proper BTS or case study.


I do believe a BTS is being developed.


Stay tuned


Hi David, have you looked into alternatives to 3DGS like https://meshsplatting.github.io/ that promise better results and faster training?


I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).

However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.


Random question, since I see your username is green.

How did you find out this was posted here?

Also, great work!


My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.

And thank you!


I've been mesmerized by the visusals of Gaussian splatting for a while now, congratulations for your great work!

Do you have some benchmarks about what is the geometric precision of these reproductions?


Thank you!

Geometric analysis for Gaussian splatting is a bit like comparing apples and oranges. Gaussian splats are not really discrete geometry, and their power lies in overlapping semi-transparent blobs. In other words, their benefit is as a radiance field and not as a surface representation.

However, assuming good camera alignment and real world scale enforced at the capture and alignment steps, the splats should match real world units quite closely (mm to cm accuracy). See: https://www.xgrids.com/intl?page=geomatics.


nice work.

I can see that relighting is still a work in progress, as the virtual spot lights tends to look flat and fake. I understand that you are just making brighter splats that fall inside the spotlight cone and darker the ones behind lots of splats.

Do you know if there are plans for gaussian splats to capture unlit albedo, roughness and metalness? So we can relight in a more realistic manner?

Also, environment radiosity doesnt seem to translate to the splats, am I right?

Thanks


Thank you!

There are many ways to relight Gaussian splats. However, the highest quality results are currently coming from raytracing/path tracing render engines (such as Octane and VRay), with 2D diffusion models in second place. Relighting with GSOPs nodes does not yield as high quality, but can be baked into the model and exported elsewhere. This is the only approach that stores the relit information in the original splat scene.

That said, you are correct that in order to relight more accurately, we need material properties encoded in the splats as well. I believe this will come sooner than later with inverse rendering and material decomposition, or technology like Beeble Switchlight (https://beeble.ai). This data can ultimately be predicted from multiple views and trained into the splats.

"Also, environment radiosity doesnt seem to translate to the splats, am I right?"

Splats do not have their own radiosity in that sense, but if you have a virtual environment, its radiosity can be translated to the splats.



Back in 2001 I was the math consultant for "A Beautiful Mind". One spends a lot of time waiting on a film set. Eventually one wonders why.

The majority of wait time was the cinematographer lighting each scene. I imagined a workflow where secondary digital cameras captured 3D information, and all lighting took place in post production. Film productions hemorrhage money by the second; this would be a massive cost saving.

I described this idea to a venture capitalist friend, who concluded one already needed to be a player to pull this off. I mentioned this to an acquaintance at Pixar (a logical player) and they went silent.

Still, we don't shoot movies this way. Not there yet...


Really cool work!


[flagged]


Is it possible you didn’t comprehend which parts were 3D?

Or if you did, perhaps a critique is better rather than just a low effort diss.


I viewed on a flat monitor, so perhaps I missed some 4D and 5D too.

/i


That's hurtful.


Take the money and never admit to selling this shit. Why would you ever willingly associate your name with this?


Read the room. Plenty of people are interested in the aesthetics and the technology.


Just because people want to give you money doesn't mean you toss your dignity out the window.


Gaussian splatting is not NeRF (neural radiance field), but it is a type of radiance field, and supports novel view synthesis. The difference is in an explicit point cloud representation (Gaussian splatting), versus a process that needs to be inferred by a neural network.


It's not a type of radiance field.


It’s literally the name of gaussian splatting. 3D Gaussian Splatting for Real Time Radiance Fields

https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/


Hmm if gaussian splatting is radiance field rendering then so is any 3D rendering, and what's the point of using the name? Though having looked up the name it seems like it isn't well defined enough to mean much anyway tbh.


A drone path would not allow for such seamless transitions, never mind the planning required to nail all that choreography, effects, etc.

This approach is 100% flexible, and I'm sure at least part of the magic came from the process of play and experimentation in post.


The aesthetic here is at least partially an intentional choice to lean into the artifacts produced by Gaussian splatting, particularly dynamic (4DGS) splatting. There is temporal inconsistency when capturing performances like this, which are exacerbated by relighting.

That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.

The quality can also be really good, especially for static environments: https://www.linkedin.com/posts/christoph-schindelar-79515351....


Hi, I'm one of the creators of GSOPs for SideFX Houdini.

The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.


Can you put "Gaussing splats" in some kind of real world metaphor so I can understand what it means? Either that or explain why "Gaussian" and why "splat".

I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.


Sure!

Gaussian splatting is a bit like photogrammetry. That is, you can record video or take photos of an object or environment from many angles and reproduce it in 3D. Gaussians have the capability to "fade" their opacity based on a Gaussian distribution. This allows them to blend together in a seamless fashion.

The splatting process is achieved by using gradient descent from each camera/image pair to optimize these ellipsoids (Gaussians) such that the reproduce the original inputs as closely as possible. Given enough imagery and sufficient camera alignment, performed using Structure from Motion, you can faithfully reproduce the entire space.

Read more here: https://towardsdatascience.com/a-comprehensive-overview-of-g....


I think this means that you could produce more versions of this music video from other points of view without having to shoot the video again. For example, the drone-like effects could take a different path through the scene. Or you could move people/objects around and still get the lighting right.

Given where this technology is today, you could imagine 5-10 years from now people will watch live sports on TV, but with their own individual virtual drone that lets them view the field from almost any point.


> I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.

Blurring is a convolution or filter operation. You take a small patch of image (5x5 pixels) and you convolve it with another fixed matrix, called a kernel. Convolution says multiply element-wise and sum. You replace the center pixel with the result.

https://en.wikipedia.org/wiki/Box_blur is the simplest kernel - all ones, and divide by the kernel size. Every pixel becomes the average of itself and its neighbors, which looks blurry. Gaussian blur is calculated in an identical way, but the matrix elements follow the "height" of a 2D Gaussian with some amplitude. It results in a bit more smoothing as farther pixels have less influence. Bigger the kernel, more blurrier the result.There are a lot of these basic operations:

https://en.wikipedia.org/wiki/Kernel_(image_processing)

If you see "Gaussian", it implies the distribution is used somewhere in the process, but splatting and image kernels are very different operations.

For what it's worth I don't think the Wikipedia article on Gaussian Blur is particularly accessible.


> explain why "Gaussian" and why "splat".

Happily. Gaussian splats are a technique for 3D images, related to point clouds. They do the same job (take a 3D capture of reality and generate pictures later from any point of view "close enough" to the original).

The key idea is that instead of a bunch of points, it stores a bunch of semi-transparent blobs - or "splats". The transparency increases quickly with distance, following a normal distribution- also known as the "Gaussian distribution."

Hence, "Gaussian splats".


Somehow this hit right in the sweet spot at my level of knowledge. Thanks!


How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity. You don’t know what a Gaussian blur does; do you know what a Gaussian is? How deeply do you want to understand?

If you’re curious start with the Wikipedia article and use an LLM to help you understand the parts that don’t make sense. Or just ask the LLM to provide a summary at the desired level of detail.


There's a Corridor Digital video being shared that explains it perfectly. With very little math.

https://youtube.com/watch?v=cetf0qTZ04Y


Amazing video, thanks for sharing this.


> How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity.

The other two replies did a pretty good job!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: