Not every camera is equipped with an excellent image signal processing (ISP) pipeline that converts raw sensor data into color images. In this paper, we present a novel learning-based model that replaces built-in ISP and synthesizes images that match the image quality from high-end professional cameras. Our approach does not rely on the sub-optimal built-in ISP at all but instead utilizes a fully convolutional network with content-aware conditional convolutions to act as ISP. To train the deep learning model, we collect a large-scale dataset with raw and RGB data pairs captured by two popular smartphones and one high-end camera. Our model takes the raw sensor data from a smartphone as input and generates an RGB image that is optimized to reach the image quality coming from the high-end camera ISP. Experimental results show that our presented model produces perceptually better images than the popular smartphones do when using the same sensor data.
Here are two examples of our dataset. We collect our data triplet with Mi phone, iPhone 6S, and Nikon Z6.
Misalignment analysis. In our dataset, most patches have misalignment to 0.4 ~ 0.7 pixels. The same misalignment analysis on different illuminations is consistent with overall misalignment distribution, as seen in (b).
We compare the results of our model with smartphone build-in ISPs, and other baseline methods.
Quantitative comparison among our model and all baseline methods. Overall, all perceptual metrics show that our proposed ISP model outperforms the baselines.
Quantitative comparison for our controlled experiments.
If you find this helpful, please cite our work:
@inproceedings{xing2022ISPDataset,
title={A Well-aligned Dataset for Learning Image Signal Processing on Smartphones from a High-end Camera},
author={Xing, Yazhou and Li, Changlin and Zhang, Xuaner and Chen, Qifeng},
booktitle={ACM SIGGRAPH 2022 Posters},
pages={1--2},
year={2022}
}