When a larger than required dimension such as memory depth or order of nonlinearity, is specified during behavioral model extraction, redundant terms can be calculated when determining the weights of the model. Extraction of a behavioral model can therefore benefit from a priori knowledge of the system to be modeled. Conversely if there is a limitation in the hardware required to calculate model outputs a limit can be set for the maximum number of weights to be used. In this letter, an approach is proposed which allows the input delay vector to be reduced to a sparse vector including the delayed samples which are most important in the construction of the power amplifier model. Simulations of behavioral models for experimentally measured data of two different PAs demonstrates the sparse models extracted in this way are as accurate as a full model but have a more compact and as a result more computationally efficient structure.