MossNet

MossNet is a neural network for solving super resolution problem(SR - further) preserving memory efficiency.

Abstract

The current state-of-the-art super resolution methods are based on diffusion models, which are computationally expensive. In the same time also much model utilize text-to-image models which are making inference even more complex. The MossNet aims to remove this complexity by utilizing U-Net architecture without text-to-image model.

Architecture

MossNet proposes context-aware patch-local upscaling mechanism, which allow to use single embedding "describing" image for whole upscaling process. In the same using smaller, fixed patches allow to reduce memory footprint.

Preliminary results

Experiment setup

Model training did not include noise reduction targets or deblur targets. The reason behind this choice is that the model is aimed to users of handsets and embedded devices that may want blurred images to be displayed as is and denoise algorithms are usually separate. Thus metrics are limited to LPIPS and PSNR. The model was trained of 50000 iterations on a open images datasets. The model then was distilled to XS version.

The model was evaluated on OpenImages dataset which was also used for training. Also to assess the model out-of-sample generalization ability arbitrary images from DIV2K and Unsplash were used during visual testing.

The GFLOPS parameter was measured on SR for image of 256x256 pixels.

Quantative results

Model	LPIPS↓	PSNR↑	GFLOPS	Param Count
MossNet-XS	0.132	27.64	3.32	64.2K
SRGAN⚠️	0.126	26.63	-	5.949M
StableSR@200 steps	0.311	23.26	-	approx. 200M
GuideSR	0.265	24.76	-	approx. 200M
EDSR⚠️	0.133	34.64	-	1.37M

Important notice: The above quantative results may have bias since the model was never trained on Div2K dataset, unlike models from this this paper. Models for which data is likely affected by this bias are marked by a warning⚠️ icon.

Though this issue is awaiting model(among others) to be specifically re-trained on DIV2K, the problem with my GPU setup would like take time to resolve. For the same reason GFLOPS is currently unavailable for most models in above table.

Visual results

DIV2K

OpenImages

Unsplash

Just random image from Unsplash.

References

Notice

As this research is still in progress, this page is left under restrictive CC license.

Copyright

This page is licensed under the CC BY-NC-ND license.

DOI: 10.5281/zenodo.15665470

Author

Flora

f104a@f104a.io

Rust-loving, Python-purring Rubyist with a taste for clean UI and warm naps in the sun.