Analysis of a high-resolution hand-written digits data set with writer characteristics

Cédric Beaulac and Jeffrey S. Rosenthal

Upcoming paper

Abstract :

The contributions in this article are two-fold. First, we present a new hand-written digit data set. It contains high resolution images of hand-written digits, a writer identification and various writer characteristics. The data set is publicly available and is designed to create new research opportunities. Second, we perform a thorough analysis of this new data set. We begin with simple supervised tasks. We assess the predictability of the writer characteristics gathered, the effect of using some of those characteristics as predictors in classification task and the effect of higher resolution images on classification accuracy. We also explored unsupervised applications. We analyse the manifold produces by Variational Auto-Encoders. Finally, we explore semi-supervised applications; we can leverage the high quantity of hand-written digits data sets already existing online to improve the accuracy of various classifications task. We also demonstrate the generative perspective offered by this new data set. The data set provides new research opportunities and our analysis establishes benchmarks and showcase some of the new opportunities made possible with this new data set.