© Springer Nature Switzerland AG 2018. This paper presents a benchmark data set for evaluating ball detection algorithms in the RoboCup Soccer Standard Platform League. We created a labelled data set of images with and without ball derived from vision log files recorded by multiple NAO robots in various lighting conditions. The data set contains 5209 labelled ball image regions and 10924 non-ball regions. Non-ball image regions all contain features that had been classified as a potential ball candidate by an existing ball detector. The data set was used to train and evaluate 252 different Deep Convolutional Neural Network (CNN) architectures for ball detection. In order to control computational requirements, this evaluation focused on networks with 2–5 layers that could feasibly run in the vision and cognition cycle of a NAO robot using two cameras at full frame rate (2 × 30 Hz). The results show that the classification performance of the networks is quite insensitive to the details of the network design including input image size, number of layers and number of outputs at each layer. In an effort to reduce the computational requirements of CNNs we evaluated XNOR-Net architectures which quantize the weights and activations of a neural network to binary values. We examined XNOR-Nets corresponding to the real-valued CNNs we had already tested in order to quantify the effect on classification metrics. The results indicate that ball classification performance degrades by 12% on average when changing from real-valued CNN to corresponding XNOR-Net.