3D LUT IP Core for FPGA
3D LUT is a widely adopted technique in cinema and display industries, renowned as a powerful tool for precise color correction. A 3D LUT is a three-dimensional mapping from one color to another. Modifying any of the R/G/B components in the mapping affects all three output components, so a 3D LUT can influence hue, saturation and luminance in fully volumetric color space. 3D LUT filters produce more refined, more professional rendering of images and are extensively used in cinematic post-production to create rich, detailed, stylized looks. Many other artistic and cinematic effects — including separating foreground from colored backgrounds — can be achieved with 3D LUTs.
BergLogic's 3D LUT IP for FPGA uses the standard tetrahedral interpolation method: the most-significant bits (MSBs) of the R/G/B input select 4 spatially-adjacent R/G/B values from the LUT (stored in FPGA Block RAM),
and the least-significant bits (LSBs) drive interpolation to produce the final output. The hardest part of designing a 3D LUT IP that handles multiple pixels in parallel
is controlling Block RAM usage. BergLogic's 3D LUT IP is deeply optimized for this — at the same processing throughput it uses less than half the BRAM of competing implementations,
making it the economical choice in FPGA-based, footprint-sensitive cameras and monitors.
BergLogic's 3D LUT IP uses AXI-Stream interfaces for both input and output — users can drop the 3D LUT IP straight into their designs. The IP accepts .cube files directly,
identical to those used by software tools like Photoshop, and produces the same visual effect. BergLogic's 3D LUT supports 33×33×33 and 17×17×17 cube sizes.
The 3D LUT IP has been thoroughly validated and deployed at scale on Xilinx UltraScale and UltraScale+ families, and is in stable commercial use.
- •   33×33×33 and 17×17×17 cube file support
- •   High-quality tetrahedral interpolation
- •   Bit depth: 8 / 10 / 12 bit
- •   Performance: 2-pixel parallel processing at 300 MHz, supporting 4K@60fps throughput
- •   Low latency: 60 clock cycles
- •   Dynamic cube-file reloading
- •   Split-screen comparison: leave the left side unprocessed and apply 3D LUT to the right side for A/B comparison
- •   AXI4-Stream video interface
- •   Reduced resource consumption; no external memory required
