Readability assessment allows educators to evaluate text complexity using single numerical scores, which helps in selecting materials suited to various proficiency levels. However, accurate calibration of difficulty requires an understanding of complex linguistic features, especially in high-stakes assessments. This study explored the relationships between 26 lexical indices and readability scores across four categories of CSAT English reading items. The data included 335 reading items from high-difficulty tests (2017–2024), analyzed with CAREC and TAALES 2.0. The age of acquisition displayed systematic variation: discourse structure inference (r = 0.712), contextual inference (r = 0.674), main idea comprehension (r = 0.513), and language component analysis (r = 0.504). Linear Mixed Effects models demonstrated significant differences in explanatory power: contextual inference (R² = 63.9%) was influenced by multiple predictors, discourse structure inference (R² = 50.1%) was affected by vocabulary maturity, main idea comprehension (R² = 39.1%) was linked to processing efficiency, and language component analysis (R² = 32.8%) was related to semantic complexity. These results indicate that lexical features contribute differently to readability assessment across item categories, suggesting the need for item-type specific lexical approaches. However, findings should be considered within the CAREC framework, as the readability measure itself includes lexical features similar to those analyzed as predictors.