Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in Student Research Symposium (SRS), IEEE 24th High Performance Computing (HiPC), 2017
Traveling Salesman Problem (TSP) is an NP-hard combinatorial optimization problem. Approximation algorithms have been used to reduce the TSP factorial time complexity to non-deterministic polynomial time successfully. However, approximation methods result in a suboptimal solution as they do not cover the entire search space adequately. Further, approximation methods are too time consuming for large input instances. GPUs have been shown to be effective in exploiting data and memory level parallelism in large, complex problems.
Recommended citation: Pramod Yelmewad, Param Hanji, Amogha Udupa, Parth shah and Basavaraj Talawar. "Parallel computing for iterative hill climbing algorithm to solve TSP". In Student Research Symposium (SRS), IEEE 24th High Performance Computing (HiPC). 2017. /files/papers/pihc-tsp.pdf
Published in Advances in Image Manipulation Workshop @ ECCV, 2020
A near-optimal reconstruction of the radiance of a High Dynamic Range scene from an exposure stack can be obtained by modeling the camera noise distribution. The latent radiance is then estimated using Maximum Likelihood Estimation. But this requires a well-calibrated noise model of the camera, which is difficult to obtain in practice. We show that an unbiased estimation of comparable variance can be obtained with a simpler Poisson noise estimator, which does not require the knowledge of camera-specific noise parameters. We demonstrate this empirically for four different cameras, ranging from a smartphone camera to a full-frame mirrorless camera. Our experimental results are consistent for simulated as well as real images, and across different camera settings.
Recommended citation: Param Hanji, Fangcheng Zhong and Rafał K. Mantiuk. "Noise-aware merging of high dynamic range image stacks without camera calibration". In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 376–391. Springer, 2020. https://www.cl.cam.ac.uk/research/rainbow/projects/noise-aware-merging/2020-ppne-mle.pdf
Published in Imaging for Deep Learning @ London Imaging Meeting, 2021
Benchmark datasets used for testing computer vision methods often contain little variation in illumination. The methods that perform well on these datasets have been observed to fail under challenging illumination conditions encountered in the real world, in particular when the dynamic range of a scene is high. We present a new dataset for evaluating computer vision methods in challenging illumination conditions such as low-light, high dynamic range, and glare. The main feature of the dataset is that each scene has been captured in all the adversarial illuminations. Moreover, each scene includes an additional reference condition with uniform illumination, which can be used to automatically generate labels for the tested computer vision methods. We demonstrate the usefulness of the dataset in a preliminary study, by evaluating the performance of popular face detection, optical flow, and object detection methods under adversarial illumination conditions. We further assess whether the performance of these applications can be improved if a different transfer function is used.
Recommended citation: Param Hanji, Muhammad Z. Alam, Nicola Giuliani, Hu Chen and Rafał K. Mantiuk. "HDR4CV: High dynamic range dataset with adversarial illumination for testing computer vision methods". In Journal of Imaging Science and Technology. 2021. https://www.cl.cam.ac.uk/research/rainbow/projects/hdr4cv-dataset/2021-hdr4cv-data.pdf
Published in 2nd Learning for Computational Imaging (LCI) Workshop @ ICCV, 2021
Single-image high dynamic range (SI-HDR) reconstruction has recently emerged as a problem well-suited for deep learning methods. Each successive technique demonstrates an improvement over existing methods by reporting higher image quality scores. This paper, however, highlights that such improvements in objective metrics do not necessarily translate to visually superior images. The first problem is the use of disparate evaluation conditions in terms of data and metric parameters, calling for a standardized protocol to make it possible to compare between papers. The second problem, which forms the main focus of this paper, is the inherent difficulty in evaluating SI-HDR reconstructions since certain aspects of the reconstruction problem dominate objective differences, thereby introducing a bias. Here, we reproduce a typical evaluation using existing as well as simulated SI-HDR methods to demonstrate how different aspects of the problem affect objective quality metrics. Surprisingly, we found that methods that do not even reconstruct HDR information can compete with state-of-the-art deep learning methods. We show how such results are not representative of the perceived quality and that SI-HDR reconstruction needs better evaluation protocols.
Recommended citation: Gabriel Eilertsen, Saghi Hajisharif, Param Hanji , Apostolia Tsirikoglou, Rafał K. Mantiuk and Jonas Unger. "How to cheat with metrics in single-image HDR reconstruction". In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 2021. https://arxiv.org/abs/2108.08713
Published in ACM SIGGRAPH Asia, 2021
With well-established methods for producing photo-realistic results, the next big challenge of graphics and display technologies is to achieve perceptual realism — producing imagery indistinguishable from real-world 3D scenes. To deliver all necessary visual cues for perceptual realism, we built a High-Dynamic-Range Multi-Focal Stereo Display that achieves high resolution, accurate color, a wide dynamic range, and most depth cues, including binocular presentation and a range of focal depth. The display and associated imaging system have been designed to capture and reproduce a small near-eye three-dimensional object and to allow for a direct comparison between virtual and real scenes. To assess our reproduction of realism and demonstrate the capability of the display and imaging system, we conducted an experiment in which the participants were asked to discriminate between a virtual object and its physical counterpart. Our results indicate that the participants can only detect the discrepancy with a probability of 0.44. With such a level of perceptual realism, our display apparatus can facilitate a range of visual experiments that require the highest fidelity of reproduction while allowing for the full control of the displayed stimuli.
Recommended citation: Fangcheng Zhong, Akshay Jindal, Ali Özgür Yöntem, Param Hanji, Simon J. Watt and Rafał K. Mantiuk. "Reproducing Reality with a High-Dynamic-Range Multi-Focal Stereo Display". In ACM Transactions on Graphics, 40(6). 2021. https://www.cl.cam.ac.uk/research/rainbow/projects/hdrmfs/Reproducing_reality_HDR_MF_S_display.pdf
Published in ACM SIGGRAPH Conference Proceedings, 2022
As the problem of reconstructing high dynamic range (HDR) images from a single exposure has attracted much research effort, it is essential to provide a robust protocol and clear guidelines on how to evaluate and compare new methods. In this work, we compared six recent single image HDR reconstruction (SI-HDR) methods in a subjective image quality experiment on an HDR display. We found that only two methods produced results that are, on average, more preferred than the unprocessed single exposure images. When the same methods are evaluated using image quality metrics, as typically done in papers, the metric predictions correlate poorly with subjective quality scores. The main reason is a significant tone and color difference between the reference and reconstructed HDR images. To improve the predictions of image quality metrics, we propose correcting for the inaccuracies of the estimated camera response curve before computing quality values. We further analyze the sources of prediction noise when evaluating SI-HDR methods and demonstrate that existing metrics can reliably predict only large quality differences.
Recommended citation: Param Hanji, Rafał K. Mantiuk, Gabriel Eilertsen, Saghi Hajisharif and Jonas Unger "Comparison of single image HDR reconstruction methods — the caveats of quality assessment". In ACM SIGGRAPH 2022 Conference Proceedings (pp. 1-8). 2022. https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/2022-sihdr-benchmark-raw.pdf
Published in ACM SIGGRAPH Conference on Visual Media Production (CVMP), 2022
Many image enhancement or editing operations, such as forward and inverse tone mapping or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique mapping, disregarding this style. In this work, we show that information about the style can be distilled from collections of image pairs and encoded into a 2- or 3-dimensional vector. This gives us not only an efficient representation but also an interpretable latent space for editing the image style. We represent the global color mapping between a pair of images as a custom normalizing flow, conditioned on a polynomial basis of the pixel color. We show that such a network is more effective than PCA or VAE at encoding image style in low-dimensional space and lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement over the state-of-the-art methods.
Recommended citation: Mustafa, Aamir, Param Hanji and Rafał K. Mantiuk. "Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping." In Proceedings of the 19th ACM SIGGRAPH European Conference on Visual Media Production (CVMP). 2022. https://www.cl.cam.ac.uk/research/rainbow/projects/distil_style/paper.pdf
Published in IEEE Transactions on Computational Imaging (TCI), 2023
Merging multi-exposure image stacks into a high dynamic range (HDR) image requires knowledge of accurate exposure times. When exposure times are inaccurate, for example, when they are extracted from a camera’s EXIF metadata, the reconstructed HDR images reveal banding artifacts at smooth gradients. To remedy this, we propose to estimate exposure ratios directly from the input images. We derive the exposure time estimation as an optimization problem, in which pixels are selected from pairs of exposures to minimize estimation error caused by camera noise. When pixel values are represented in the logarithmic domain, the problem can be solved efficiently using a linear solver. We demonstrate that the estimation can be easily made robust to pixel misalignment caused by camera or object motion by collecting pixels from multiple spatial tiles. The proposed automatic exposure estimation and alignment eliminates banding artifacts in popular datasets and is essential for applications that require physically accurate reconstructions, such as measuring the modulation transfer function of a display. The code for the method is available.
Recommended citation: Param Hanji and Rafał K. Mantiuk. "Robust estimation of exposure ratios in multi-exposure image stacks." In IEEE Transactions on Computational Imaging (TCI), 9, pp. 721-731, 2023. https://arxiv.org/abs/2308.02968
Published in Advances in Neural Information Processing Systems, 2023
While deep learning techniques have become extremely popular for solving a broad range of optimization problems, methods to enforce hard constraints during optimization, particularly on deep neural networks, remain underdeveloped. Inspired by the rich literature on meshless interpolation and its extension to spectral collocation methods in scientific computing, we develop a series of approaches for enforcing hard constraints on neural fields, which we refer to as Constrained Neural Fields (CNF). The constraints can be specified as a linear operator applied to the neural field and its derivatives. We also design specific model representations and training strategies for problems where standard models may encounter difficulties, such as conditioning of the system, memory consumption, and capacity of the network when being constrained. Our approaches are demonstrated in a wide range of real-world applications. Additionally, we develop a framework that enables highly efficient model and constraint specification, which can be readily applied to any downstream task where hard constraints need to be explicitly satisfied during optimization.
Recommended citation: Fangcheng Zhong, Kyle Fogarty, Param Hanji, Tianhao Wu, Alejandro Sztrajman, Andrew Spielberg, Andrea Tagliasacchi, Petra Bosilj and Cengiz Öztireli. "Neural Fields with Hard Constraints of Arbitrary Differential Order." In Advances in Neural Information Processing Systems. 2023. https://cnf2023.netlify.app/
Published in Eurographics, 2024
Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.
Recommended citation: Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafał K. Mantiuk and Cengiz Öztireli. "Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views." In Computer Graphics Forum. 2024. https://arxiv.org/abs/2303.15206
Published in ACM SIGGRAPH, 2024
ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common video-streaming distortions (e.g., video compression, rescaling, and transmission errors) and also 8 new distortion types related to AR/VR displays (e.g., light source and waveguide non-uniformities). To address the latter application, we collected our novel XR-Display-Artifact-Video quality dataset (XR-DAVID), comprised of 336 distorted videos. Extensive testing on XR-DAVID, as well as several datasets from the literature, indicate a significant gain in prediction performance compared to existing metrics. ColorVideoVDP opens the doors to many novel applications that require the joint automated spatiotemporal assessment of luminance and color distortions, including video streaming, display specification, and design, visual comparison of results, and perceptually-guided quality optimization.
Recommended citation: Rafał K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano and Alexandre Chapiro. "ColorVideoVDP: A visual difference predictor for image, video and display distortions." In ACM Transactions on Graphics (TOG), Volume 43, Issue 4. 2024. https://www.cl.cam.ac.uk/~rkm38/pdfs/mantiuk2024_ColorVideoVDP.pdf
Published in International Conference on Computer Vision, 2024
We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad’s fast and scalable sampling. Our quantitative and qualitative results demonstrate FrePolad’s state-of-the-art performance in terms of quality, diversity, and computational efficiency.
Recommended citation: Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao and Cengiz Oztireli. "FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation." In Proceedings of the International Conference on Computer Vision (ICCV). 2024. https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08460.pdf
Published:
Presented our paper on HDR merging at the Advances in Image Manipulation (AIM) workshop, part of ECCV 2020.
Published:
In this 10-minute talk, I summarised the research conducted in the first year of my PhD and identified a few directions for the future.
Published:
The theme for LIM 2021 was Imaging for deep learning. I presented our journal submission, HDR4CV, that describes an evaluation dataset to automatically test new computer vision methods.
Published:
I introduce a probabilistic framework to learn a bijective mapping between two domains using only unpaired data. The framework uses invertible layers with tractable determinant-jacobians to find the mapping using exact likelihood training. You can find some preliminary code at this Github repo. If you’ve got a nifty application that stands to benifit from this invertible framework, get in touch.
Published:
I worked with Cambridge Spark, on their ML-Bootcamp for people working in finance. First, I tutored and helped improve existing modules on various topics including optimization, generative modelling, meta-learning, etc. I then delivered part of the lecture on "Normalizing flows", where I discussed popular invertible architectures with tractable Jacobians.
Published:
This was an invited lecture for L335: Machine Visual Perception, an MPhil Course at the Department of Computer Science and Technology, University of Cambridge. I discussed popular deep generative architectures, with an emphasis on how they are used in Computer Vision and Graphics.