In this work Statistical Graphical Language Models (SGLMs), a technique adapted from Statistical Language Models (SLMs), are applied to the task of graphical object recognition. SLMs are used in Natural Language Processing for tasks such as Speech Recognition and Information Retrieval. SGLMs view graphical objects as belonging to graphical languages and use this view to compute probabilistic distributions of graphical objects within graphical documents. SGLMs such as N-grams require large corpora of training data, which consist of graphical objects in contextual use (real world graphical documents). Constructing corpora is an important stage in developing the models and many issues need to be addressed. This paper discusses the development of graphical corpora and presents approaches to some of the problems encountered.