Language-brain encoding experiments evaluate the ability of language models
to predict brain responses elicited by language stimuli. The evaluation
scenarios for this task have not yet been standardized which makes it difficult
to compare and interpret results. We perform a series of evaluation experiments
with a consistent encoding setup and compute the results for multiple fMRI
datasets. In addition, we test the sensitivity of the evaluation measures to
randomized data and analyze the effect of voxel selection methods. Our
experimental framework is publicly available to make modelling decisions more
transparent and support reproducibility for future comparisons.

