Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon
Posted by shivampkumar 1 day ago
I ported Microsoft's TRELLIS.2 (4B parameter image-to-3D model) to run on Apple Silicon via PyTorch MPS. The original requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels: none of which work on Mac.
I replaced the CUDA-specific ops with pure-PyTorch alternatives: a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python-based mesh extraction replacing CUDA hashmap operations. Total changes are a few hundred lines across 9 files.
Generates ~400K vertex meshes from single photos in about 3.5 minutes on M4 Pro (24GB). Not as fast as H100 (where it takes seconds), but it works offline with no cloud dependency.
Comments
Comment by sebakubisz 1 day ago
Comment by shivampkumar 1 day ago
Comment by sergiopreira 1 day ago
Comment by gondar 1 day ago
Comment by shivampkumar 1 day ago
From what I've seen personally, and community benchmarks, it does fair on geometry and visual fidelity among open-source options, but I agree it's not perfect for every use case.
Meshy is solid, I used it to print my girlfriend a mini 3d model of her on her birthday last year!
Though worth noting it's a paid service, and free tier has usage limitations while TRELLIS.2 is MIT licensed with unlimited local generation. Different tradeoffs for different workflows. Hopefully the open-source side keeps improving.
Comment by isoprophlex 1 day ago
Comment by jseabra 22 hours ago
Comment by strimoza 21 hours ago
Comment by pcoyne 20 hours ago
https://github.com/apple/ml-sharp
No matter what it is cool seeing so much them work on different devices
Comment by vunderba 18 hours ago
Comment by petargyurov 1 day ago
Out of curiosity, how did you go about replacing the CUDA specific ops? Any resources you relied on or just experience? Would love to learn more.
Comment by kennyloginz 1 day ago
Comment by shivampkumar 1 day ago
Comment by shivampkumar 1 day ago
Comment by post-it 1 day ago
Comment by shivampkumar 1 day ago
Comment by post-it 1 day ago
Comment by Serhii-Set 1 day ago
Comment by paolatauru 19 hours ago
Comment by drbscl 22 hours ago
Comment by antirez 1 day ago
Comment by villgax 1 day ago
Comment by shivampkumar 1 day ago
Comment by Reubend 1 day ago
Comment by villgax 21 hours ago
Comment by refulgentis 1 day ago
Comment by shivampkumar 1 day ago
It IS significantly slower, about 3.5 minutes on my MacBook vs seconds on an H100. That's partly the pure-PyTorch backend overhead and partly just the hardware difference.
For my use case the tradeoff works -- iterate locally without paying for cloud GPUs or waiting in queues.
Comment by jmatthews 1 day ago
Comment by serf 1 day ago
Comment by shivampkumar 1 day ago
I'm still working on this to try to replicate nvdiffrast better. Found an open source port, might look it tonight
Comment by shivampkumar 1 day ago
Comment by Olivia_Pan 6 hours ago
Comment by techpulselab 1 day ago
Comment by jiexiang 1 day ago
Comment by takahitoyoneda 22 hours ago
Comment by vrr044 1 day ago
Comment by hank808 1 day ago
Comment by refulgentis 1 day ago
Comment by shivampkumar 1 day ago
If you're not working with 3D on Apple Silicon this isn't relevant to you. For the subset of people who are, running this 4B parameter 3D generation model locally on a Mac was previously blocked by hard CUDA dependencies with no workaround.
Comment by kennyloginz 1 day ago