A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.



Denoising diffusion probabilistic models

11 minute read


The majority of deep generative models proposed in the last few years have broadly fallen under three categories—generative adversarial networks (GANs), variational autoencoders (VAEs), and normalizing flows. There are few others, such as autoregressive models and those based on transformers, but they are much slower to sample. As a result, they are not widely used, particularly when the distribution being modeled is very high-dimensional.

Least Squares

less than 1 minute read


Here’s a little piece on the different pictures of linear least squares I wrote for towardsdatascience. All the code used to generate the plots can be found in this github repo.


less than 1 minute read


Back in 2016, I took part in a large open-source program called Google Summer of Code or GSoC. Specifically, I worked with an organization called BRL-CAD which specialised in a computer-aided design (CAD) software. Here is the official project page from GSoC archives.



Parallel Computing for Iterative Hill Climbing Algorithm to solve TSP

Published in Student Research Symposium (SRS), IEEE 24th High Performance Computing (HiPC), 2017

Traveling Salesman Problem (TSP) is an NP-hard combinatorial optimization problem. Approximation algorithms have been used to reduce the TSP factorial time complexity to non-deterministic polynomial time successfully. However, approximation methods result in a suboptimal solution as they do not cover the entire search space adequately. Further, approximation methods are too time consuming for large input instances. GPUs have been shown to be effective in exploiting data and memory level parallelism in large, complex problems.

Recommended citation: Pramod Yelmewad, Param Hanji, Amogha Udupa, Parth shah and Basavaraj Talawar. "Parallel computing for iterative hill climbing algorithm to solve TSP". In Student Research Symposium (SRS), IEEE 24th High Performance Computing (HiPC). 2017. /files/papers/pihc-tsp.pdf

Noise-Aware Merging of High Dynamic Range Image Stacks without Camera Calibration

Published in Advances in Image Manipulation Workshop @ ECCV, 2020

A near-optimal reconstruction of the radiance of a High Dynamic Range scene from an exposure stack can be obtained by modeling the camera noise distribution. The latent radiance is then estimated using Maximum Likelihood Estimation. But this requires a well-calibrated noise model of the camera, which is difficult to obtain in practice. We show that an unbiased estimation of comparable variance can be obtained with a simpler Poisson noise estimator, which does not require the knowledge of camera-specific noise parameters. We demonstrate this empirically for four different cameras, ranging from a smartphone camera to a full-frame mirrorless camera. Our experimental results are consistent for simulated as well as real images, and across different camera settings.

Recommended citation: Param Hanji, Fangcheng Zhong and Rafał K. Mantiuk. "Noise-aware merging of high dynamic range image stacks without camera calibration". In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 376–391. Springer, 2020.

HDR4CV: High dynamic range dataset with adversarial illumination for testing computer vision methods

Published in Imaging for Deep Learning @ London Imaging Meeting, 2021

Benchmark datasets used for testing computer vision methods often contain little variation in illumination. The methods that perform well on these datasets have been observed to fail under challenging illumination conditions encountered in the real world, in particular when the dynamic range of a scene is high. We present a new dataset for evaluating computer vision methods in challenging illumination conditions such as low-light, high dynamic range, and glare. The main feature of the dataset is that each scene has been captured in all the adversarial illuminations. Moreover, each scene includes an additional reference condition with uniform illumination, which can be used to automatically generate labels for the tested computer vision methods. We demonstrate the usefulness of the dataset in a preliminary study, by evaluating the performance of popular face detection, optical flow, and object detection methods under adversarial illumination conditions. We further assess whether the performance of these applications can be improved if a different transfer function is used.

Recommended citation: Param Hanji, Muhammad Z. Alam, Nicola Giuliani, Hu Chen and Rafał K. Mantiuk. "HDR4CV: High dynamic range dataset with adversarial illumination for testing computer vision methods". In Journal of Imaging Science and Technology. 2021.

How to cheat with metrics in single-image HDR reconstruction

Published in 2nd Learning for Computational Imaging (LCI) Workshop @ ICCV, 2021

Single-image high dynamic range (SI-HDR) reconstruction has recently emerged as a problem well-suited for deep learning methods. Each successive technique demonstrates an improvement over existing methods by reporting higher image quality scores. This paper, however, highlights that such improvements in objective metrics do not necessarily translate to visually superior images. The first problem is the use of disparate evaluation conditions in terms of data and metric parameters, calling for a standardized protocol to make it possible to compare between papers. The second problem, which forms the main focus of this paper, is the inherent difficulty in evaluating SI-HDR reconstructions since certain aspects of the reconstruction problem dominate objective differences, thereby introducing a bias. Here, we reproduce a typical evaluation using existing as well as simulated SI-HDR methods to demonstrate how different aspects of the problem affect objective quality metrics. Surprisingly, we found that methods that do not even reconstruct HDR information can compete with state-of-the-art deep learning methods. We show how such results are not representative of the perceived quality and that SI-HDR reconstruction needs better evaluation protocols.

Recommended citation: Gabriel Eilertsen, Saghi Hajisharif, Param Hanji , Apostolia Tsirikoglou, Rafał K. Mantiuk and Jonas Unger. "How to cheat with metrics in single-image HDR reconstruction". In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 2021.

Reproducing Reality with a High-Dynamic-Range Multi-Focal Stereo Display

Published in ACM SIGGRAPH Asia, 2021

With well-established methods for producing photo-realistic results, the next big challenge of graphics and display technologies is to achieve perceptual realism — producing imagery indistinguishable from real-world 3D scenes. To deliver all necessary visual cues for perceptual realism, we built a High-Dynamic-Range Multi-Focal Stereo Display that achieves high resolution, accurate color, a wide dynamic range, and most depth cues, including binocular presentation and a range of focal depth. The display and associated imaging system have been designed to capture and reproduce a small near-eye three-dimensional object and to allow for a direct comparison between virtual and real scenes. To assess our reproduction of realism and demonstrate the capability of the display and imaging system, we conducted an experiment in which the participants were asked to discriminate between a virtual object and its physical counterpart. Our results indicate that the participants can only detect the discrepancy with a probability of 0.44. With such a level of perceptual realism, our display apparatus can facilitate a range of visual experiments that require the highest fidelity of reproduction while allowing for the full control of the displayed stimuli.

Recommended citation: Fangcheng Zhong, Akshay Jindal, Ali Özgür Yöntem, Param Hanji, Simon J. Watt and Rafał K. Mantiuk. "Reproducing Reality with a High-Dynamic-Range Multi-Focal Stereo Display". In ACM Transactions on Graphics, 40(6). 2021.

Comparison of single image HDR reconstruction methods — the caveats of quality assessment

Published in ACM SIGGRAPH Conference Proceedings, 2022

As the problem of reconstructing high dynamic range (HDR) images from a single exposure has attracted much research effort, it is essential to provide a robust protocol and clear guidelines on how to evaluate and compare new methods. In this work, we compared six recent single image HDR reconstruction (SI-HDR) methods in a subjective image quality experiment on an HDR display. We found that only two methods produced results that are, on average, more preferred than the unprocessed single exposure images. When the same methods are evaluated using image quality metrics, as typically done in papers, the metric predictions correlate poorly with subjective quality scores. The main reason is a significant tone and color difference between the reference and reconstructed HDR images. To improve the predictions of image quality metrics, we propose correcting for the inaccuracies of the estimated camera response curve before computing quality values. We further analyze the sources of prediction noise when evaluating SI-HDR methods and demonstrate that existing metrics can reliably predict only large quality differences.

Recommended citation: Param Hanji, Rafał K. Mantiuk, Gabriel Eilertsen, Saghi Hajisharif and Jonas Unger "Comparison of single image HDR reconstruction methods — the caveats of quality assessment". In ACM SIGGRAPH 2022 Conference Proceedings (pp. 1-8). 2022.

Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping

Published in ACM SIGGRAPH Conference on Visual Media Production (CVMP), 2022

Many image enhancement or editing operations, such as forward and inverse tone mapping or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique mapping, disregarding this style. In this work, we show that information about the style can be distilled from collections of image pairs and encoded into a 2- or 3-dimensional vector. This gives us not only an efficient representation but also an interpretable latent space for editing the image style. We represent the global color mapping between a pair of images as a custom normalizing flow, conditioned on a polynomial basis of the pixel color. We show that such a network is more effective than PCA or VAE at encoding image style in low-dimensional space and lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement over the state-of-the-art methods.

Recommended citation: Mustafa, Aamir, Param Hanji and Rafał K. Mantiuk. "Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping." In Proceedings of the 19th ACM SIGGRAPH European Conference on Visual Media Production (CVMP). 2022.

Robust estimation of exposure ratios in multi-exposure image stacks

Published in IEEE Transactions on Computational Imaging (TCI), 2023

Merging multi-exposure image stacks into a high dynamic range (HDR) image requires knowledge of accurate exposure times. When exposure times are inaccurate, for example, when they are extracted from a camera’s EXIF metadata, the reconstructed HDR images reveal banding artifacts at smooth gradients. To remedy this, we propose to estimate exposure ratios directly from the input images. We derive the exposure time estimation as an optimization problem, in which pixels are selected from pairs of exposures to minimize estimation error caused by camera noise. When pixel values are represented in the logarithmic domain, the problem can be solved efficiently using a linear solver. We demonstrate that the estimation can be easily made robust to pixel misalignment caused by camera or object motion by collecting pixels from multiple spatial tiles. The proposed automatic exposure estimation and alignment eliminates banding artifacts in popular datasets and is essential for applications that require physically accurate reconstructions, such as measuring the modulation transfer function of a display. The code for the method is available.

Recommended citation: Param Hanji and Rafał K. Mantiuk. "Robust estimation of exposure ratios in multi-exposure image stacks." In IEEE Transactions on Computational Imaging (TCI), 9, pp. 721-731, 2023.

Neural Fields with Hard Constraints of Arbitrary Differential Order

Published in Advances in Neural Information Processing Systems, 2023

While deep learning techniques have become extremely popular for solving a broad range of optimization problems, methods to enforce hard constraints during optimization, particularly on deep neural networks, remain underdeveloped. Inspired by the rich literature on meshless interpolation and its extension to spectral collocation methods in scientific computing, we develop a series of approaches for enforcing hard constraints on neural fields, which we refer to as Constrained Neural Fields (CNF). The constraints can be specified as a linear operator applied to the neural field and its derivatives. We also design specific model representations and training strategies for problems where standard models may encounter difficulties, such as conditioning of the system, memory consumption, and capacity of the network when being constrained. Our approaches are demonstrated in a wide range of real-world applications. Additionally, we develop a framework that enables highly efficient model and constraint specification, which can be readily applied to any downstream task where hard constraints need to be explicitly satisfied during optimization.

Recommended citation: Fangcheng Zhong, Kyle Fogarty, Param Hanji, Tianhao Wu, Alejandro Sztrajman, Andrew Spielberg, Andrea Tagliasacchi, Petra Bosilj and Cengiz Öztireli. "Neural Fields with Hard Constraints of Arbitrary Differential Order." In Advances in Neural Information Processing Systems. 2023.

Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views

Published in Eurographics, 2024

Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.

Recommended citation: Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafał K. Mantiuk and Cengiz Öztireli. "Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views." In Computer Graphics Forum. 2024.


Inverse problems in graphics and vision


In this 10-minute talk, I summarised the research conducted in the first year of my PhD and identified a few directions for the future.

Unsupervised doman translation using normalizing flows


I introduce a probabilistic framework to learn a bijective mapping between two domains using only unpaired data. The framework uses invertible layers with tractable determinant-jacobians to find the mapping using exact likelihood training. You can find some preliminary code at this Github repo. If you’ve got a nifty application that stands to benifit from this invertible framework, get in touch.

ML-Bootcamp for Finance


I worked with Cambridge Spark, on their ML-Bootcamp for people working in finance. First, I tutored and helped improve existing modules on various topics including optimization, generative modelling, meta-learning, etc. I then delivered part of the lecture on "Normalizing flows", where I discussed popular invertible architectures with tractable Jacobians.

Generative modelling (L335)


This was an invited lecture for L335: Machine Visual Perception, an MPhil Course at the Department of Computer Science and Technology, University of Cambridge. I discussed popular deep generative architectures, with an emphasis on how they are used in Computer Vision and Graphics.