Abstract: Predicting the onset of psychosis is crucial for early intervention and improved outcomes. This review examines the current state of prediction models based on clinical, neurocognitive, and linguistic factors. Clinical predictors, including sociodemographic characteristics, family history, and subthreshold psychotic symptoms, have shown promise in identifying people at risk, and some models achieve concordance indices of 0.79-0.80 in external validation. Neurocognitive evaluation, particularly of verbal learning, processing speed, and attention/vigilance, has emerged as a cost-effective predictor, although the effect sizes remain modest. Recent advances in natural language processing have enabled automated analysis of speech patterns, with reduced semantic coherence and specific linguistic features predicting the transition to psychosis with precisions of up to 83%. Although these approaches show promise individually, the integration of multiple predictors may maximize predictive accuracy. Current limitations include small sample sizes in many studies, especially for linguistic analyses, and the need for broader population-level applicability beyond clinical high-risk groups. Dynamic prediction models that account for temporal changes in risk factors show improved performance over static approaches. More research is needed, particularly external validation studies in diverse populations, to develop comprehensive preventive strategies that can be implemented at the primary level. The field continues to evolve with emerging variables and advanced analytical methods, working toward an individualized application of prediction tools.
Keywords: psychotic disorder, precision medicine, clinical relevance, neurobehavioral cognitive status examination, linguistics.
Resumen: Predecir el inicio de la psicosis es crucial para la intervención temprana y la mejora de los resultados. Esta revisión examina el estado actual de los modelos de predicción basados en factores clínicos, neurocognitivos y lingüísticos. Los predictores clínicos, que incluyen características sociodemográficas, antecedentes familiares y síntomas psicóticos subumbrales, han mostrado ser prometedores para identificar a personas en riesgo; y algunos modelos alcanzan índices de concordancia de 0,79-0,80 en validaciones externas. La evaluación neurocognitiva (particularmente del aprendizaje verbal, la velocidad de procesamiento y de la atención/vigilancia) ha emergido como un predictor rentable, aunque los tamaños del efecto siguen siendo modestos. Los avances recientes en el procesamiento del lenguaje natural han permitido el análisis automatizado de patrones del habla, con una coherencia semántica reducida y características lingüísticas específicas que predicen la transición a la psicosis con precisiones de hasta el 83 %. Aunque estos enfoques son prometedores individualmente, la integración de múltiples predictores podría maximizar la precisión de la predicción. Las limitaciones actuales incluyen tamaños de muestra pequeños en muchos estudios, especialmente en los análisis lingüísticos, así como la necesidad de una aplicabilidad más amplia a nivel poblacional, muy aparte de los grupos clínicamente de alto riesgo. Los modelos de predicción dinámica que consideran los cambios temporales en los factores de riesgo muestran un mejor desempeño en comparación con los enfoques estáticos. Se necesita más investigación, particularmente estudios de validación externa en poblaciones diversas, para desarrollar estrategias preventivas integrales que puedan implementarse a nivel primario. El campo sigue evolucionando con variables emergentes y métodos analíticos avanzados, que trabajan hacia una aplicación individualizada de herramientas de predicción.
Palabras clave: trastornos psicóticos, medicina de precisión, relevancia clínica, examen neuroconductual del estado cognitivo, lingüística.
Artículo de revisión
Predicting psychosis based on clinical, neurocognitive, and linguistic factors
Predicción de la psicosis basada en factores clínicos, neurocognitivos y lingüísticos

Recepción: 16 Enero 2025
Aprobación: 03 Marzo 2025
A psychotic disorder refers to a significant impairment in the way an individual interprets or perceives reality (1). The negative impact of psychosis can be visualized from a universal and individual perspective. Globally, psychotic disorders have a prevalence of 6% in individuals with psychiatric disorders (2) and are the third most important cause of disability-adjusted life years related to mental disorders (3). Individually, psychosis adversely impacts personal development, including a higher risk of substance abuse (51.7%) (4) and suicide attempts (on average, 30%) (5), as well as reduced levels of employment and education (6, 7).
To avoid these detrimental results, early detection of people at risk of developing a psychotic episode is crucial. In this context, prediction models of psychosis are highlighted (8). A prediction model in psychosis is essentially the use of hallmarks, known as predictors, to predict the risk of the onset of psychosis in an individual. The predictors are related to individual-specific attributes, disorder-related variables, and therapeutic considerations (9). This kind of model would have a positive impact due to better-tailored decisions (e.g., therapeutic strategies and anticipating treatment efficacy) for clinicians and appropriate shared decision-making with patients (9, 10).
A key concept in this field is the clinical high-risk for psychosis (CHR-P) paradigm, which explains that subtle changes occur before the onset of a psychotic episode. More precisely, alterations are mild or subthreshold manifestations of delusions, formal thought disorder, and hallucinations accompanied by decreased functioning (11). The risk of transition to psychosis in this population is 25% within the first 3 years (12).
Hence, various prediction models in psychosis have been widely used for subjects with CHR-P (13). These models have used many indicators: clinical symptoms and biomarkers (neurocognitive function, neuroimaging, cortisol levels, and electrophysiological tests) (14, 15). The sensitivity and specificity of the models for predicting the transition to a psychotic episode are 67% and 78%, respectively (16). In a high-stakes scenario like this, which means that errors in predictions may result in unnecessary treatment or failure to provide necessary interventions for psychosis, it is crucial to also consider the C-index (concordance index). It is a metric that evaluates a model’s ability to accurately distinguish between individuals who transition to psychosis and those who do not. Its values range from 0.5 (random chance) to 1.0 (perfect discrimination). It is calculated by comparing pairs of individuals, where one has experienced the transition to psychosis and the other has not or did so later. Notably, this metric is time-dependent, as it considers the timing of events. Additionally, it incorporates elements of sensitivity and specificity in assessing model performance (17, 18). A higher value (>0.79 or 79%) is considered good (19, 20).
One of the main difficulties of predictive models in psychosis is their clinical application; this means translating the group-level conclusions to real-world clinical practice (21, 22). Therefore, in this review, we will discuss psychosis prediction models in terms of clinical symptoms, neurocognitive alterations, linguistic factors, and address the main research gaps.
In this narrative review, we evaluated the existing evidence on prediction models for psychosis in a CHR-P population, which was documented and retrievable in the following research databases: PubMed/MEDLINE, EMBASE, Scopus, and ScienceDirect. The following terms were utilized (in all fields): “clinical high-risk psychosis”, “predict”, “predictive model”, “psychosis”, “neurocognitive”, and “language”. Only English articles were considered, with preference given to recent publications.
One of the pioneering large-scale databases was the North American Prodrome Longitudinal Study (NAPLS) (23), with 291 clinical high-risk patients and 134 controls. It determined that 35% of individuals in the high-risk group transitioned to psychosis, with an average conversion time of 275.5 days (standard deviation of 243.7 days). Follow-up was done for 2.5 years. The significant variables related to the transition to psychosis were social dysfunction (hazard ratio - HR: 1.79), genetic predisposition to schizophrenia combined with a recent decrease in functionality (HR: 1.96), suspicion and/or paranoia (HR: 2.12), previous substance abuse (HR: 2.08) and unusual thought content (HR: 1.98). The highest positive predictive values in the model were linked to the following factor combinations: genetic predisposition to schizophrenia with a recent decline in functionality, paired with unusual thought content and either suspicion/paranoia (74%) or social dysfunction (81%). Although this study provides clinical variables, it was applied to a sample that was affected to some extent and not to the general population, in addition to using qualitative instead of quantitative measures.
NAPLS had a second phase, with a recruitment of 596 people with clinical high risk of psychosis (24). 16% was the likelihood of developing psychosis during the 2-year follow-up. The average time elapsed from the initial assessment to the appearance of psychosis was 7.3 months. Predictors significantly associated with the risk of conversion to psychosis were unusual thought content and suspiciousness (HR: 2.1), impaired social performance (HR: 1.3), and cognitive symptoms (reduced verbal learning and memory abilities accompanied by slower processing speed) (HR: 0.8). This model achieved a C-index score of 0.71. The strengths of this second study, compared to the previous, are a larger number of people (almost twice as many, helping to improve statistical power and make the observations more generalizable) (20), the evaluation of more variables, such as cognition (verbal learning, memory and processing speed), a high C-index (20) and the external validation of the model in a sample of 176 clinical high-risk patients (25), which showed that the C-index improved to 0.79 and that the variables used in NAPLS-2 were significantly predictive of psychosis in this latter validation. Keeping in mind, the external validation is a necessary step to assess whether an original model for psychosis is generalizable or transportable, which means that the model can perform in an independent group with distinct attributes and valid or reproducible for new individuals resembling the original cohort (8, 9, 26).
Another study to highlight (8), with a large sample of individuals (91,199), employed analytical methods and extensive use of health records to determine the risk of psychosis. It pinpointed that, significantly, the total probability at six years of a psychotic disorder was 3.02%. Furthermore, several other variables were significant and associated with a higher risk of psychosis: age (HR: 1.01), male sex (HR: 1.76), ethnicity (Black – HR: 2.82, Asian – HR: 1.67, Mixed – HR: 1.84, and others – HR: 1.50), and diagnosis of bipolar disorder (HR: 4.63) or acute and transient psychotic disorder (HR: 5.46). The calculated C-index was a significant 0.80. It should be noted that only 5.19% of the cases that transitioned to psychosis corresponded to the CHR-P. Within this group, the brief limited intermittent psychotic symptoms subgroup had a significantly higher risk of transition to psychosis (HR: 1.94). Furthermore, an external validation was performed, and there were 1010 cases that transitioned to psychosis (1.19% corresponded to CHR-P). The C-index for this validation was 0.79. The study data, in general, provided relevant clinical outcomes, such as the identification of the CHR-P population, which is appropriate to forecast a psychotic episode (27, 28), given that other diagnoses (such as depression and anxiety) did not confer the risk of transitioning to psychosis (except for bipolar and acute and transient psychotic disorders) and that the CHR-P classification is not enough to address the risk load of psychosis in a secondary mental health service (a low 5.19% of all cases). This means that this group does not account for most individuals susceptible to psychosis, and some authors recommend that the parameters of CHR-P should be broadened (29, 30).
As stated above, this research provided relevant clinical conclusions; however, some limitations were also drawn. One of the main points is that diagnoses were extracted from health care records, and while this may give it great value from an ecological point of view, as a real-world setting, there was no regulated verification of these diagnoses with the standards in research (e.g., structured diagnostic assessments) (31, 32). Another point is that the model cannot be extrapolated to other clinical settings, such as a primary health service, as this study was carried out in a secondary mental health service, and the risk factors found are useful for patients who are evaluated in these types of health centres.
With regard to cultural factors, evidence suggests that perceived discrimination – religious affiliation, disability status, physical traits, sexual orientation, and skin colour – among CHR-P individuals is significantly linked with an increased likelihood of psychosis (HR: 1.1) (33). Another psychosocial factor to underscore is the socioeconomic status, which encompasses employment, financial status and educational attainment (34). Research indicates that low socioeconomic status correlates with psychosis through two distinct mechanisms. The first is a concurrent relationship, where living in economically disadvantaged environments correlates with increased psychosis incidence (age-standardized incidence rate: 72.4 per 100,000 person-years) (35), and the second involves transgenerational effects, with low paternal socioeconomic status significantly associated with a higher risk for psychosis in offspring (OR range: 1.56) (36). While evidence from cultural and socioeconomic investigations indicates potential utility in psychosis risk assessment, establishing causal relationships necessitates both larger-scale sampling and thorough examination of putative biological mechanisms underlying these associations (37).
A recent systematic review (38) identified several significant predictors of progression to psychosis in patients with CHR-P. These included psychosocial factors (sex male, history of trauma, baseline living status and employment), clinical factors (severity of attenuated psychotic symptoms, negative and disorganized/cognitive symptoms, and reduced global functioning at baseline) and biomarkers (verbal memory deficits, structural and functional abnormalities on magnetic resonance imaging, and specific variations in negative mismatch measured by electroencephalography). However, many of these predictors were characterized by small effect sizes or limited supporting evidence.
In general, there is suggestive evidence that clinical factors can effectively predict a psychotic episode. These include psychosocial and sociodemographic factors, such as age, male sex, and a family history of schizophrenia, as well as clinical factors, such as recent declines in social functioning, subthreshold psychotic symptoms (e.g., unusual thought content, suspiciousness, disorganization), and negative symptoms. These factors are practical to assess during routine clinical evaluations, providing clinicians with additional tools for a more personalized assessment of a patient's risk (8, 14, 39). However, further external validation of these models is necessary for their implementation in clinical practice guidelines for people at risk of psychosis (39, 40).
Another feasible and critical evaluation for the prediction of psychosis is neurocognitive assessment (41). This implies the use of a set of batteries to evaluate multiple cognitive functions. The specific areas affected in the CHR-P population are memory, attention, and executive functioning (42). Furthermore, these impairments are often more pronounced in individuals with CHR-P who later progress to psychosis (43).
A recent systematic review and meta-analysis on neurocognitive performance (44), which included 5162 people with CHR-P, found that this population exhibited significant impairments in several cognitive domains. The neurocognitive measures most strongly associated with predicting psychosis included verbal learning, processing speed, attention/vigilance, and intelligence quotient, each demonstrating medium effect sizes. Another meta-analysis (45) identified the processing speed (medium effect sizes) associated with the risk of psychosis, but found no significant differences with verbal learning and attention/vigilance. These differences could be due to the small data in the second study (k=3, 44 patients with CHR-P who transitioned to psychosis). A shared limitation between both studies is the overall small sample size and the modest effect sizes observed, which may limit the precision of predictions (10). Furthermore, data from both studies are drawn from a specific population of CHR-P patients, making it difficult to generalize the findings to the general population.
In general, neurocognitive evaluation of specific domains emerges as a relevant clinical predictor of psychosis. The strengths of cognitive measures for this purpose include their cost-effectiveness (46), which means that they require less time and no expensive equipment (47) to achieve substantial benefits such as the prediction of psychosis. Additionally, its correlation with neuroimaging findings (48, 49) provides information on the neurobiological processes underlying psychosis. Finally, their practical applicability makes them a friendly tool for clinicians (44). To overcome the limitations identified, it is recommended to increase sample sizes to include a broader range of variables, such as cultural backgrounds, as this would help to enhance statistical power; multicenter collaborations could be a viable option (50). Furthermore, incorporating machine learning techniques could help improve the modest effect sizes and generalize predictions across different populations (39).
A defining feature of psychosis is the disruption of logical or coherent thought processes as conveyed through speech, a phenomenon called Formal Thought Disorder (51). Natural Language Processing (NLP) is an automated technique used to analyze speech disorganization, drawing from the fields of computational science, language processing, and artificial intelligence (e.g., machine learning) (52). In essence, NLP converts words, sentences, or speech excerpts into vectors—quantifiable representations—that allow for the evaluation of cohesion in both syntactic and semantic structures on a larger scale (51, 52).
A study by Corcoran et al. (53) used NLP to identify that reduced semantic coherence, high variance in coherence, and less frequent use of possessive pronouns were associated with the transition to psychosis in a cohort of patients with CHR-P. An advantage of this study, beyond achieving a high internal validation accuracy of 83%, was its robust external validation, with an accuracy of 79% in an independent data set.
Another study (50) showed that CHR-P individuals who later developed a psychotic episode exhibited significantly reduced scores on the largest strongly connected component (LSC) and its corresponding ratio. This method provides an alternative approach to evaluating semantic language through the analysis of graph connectivity.
Both studies indicated that semantic coherence measurements could potentially predict psychosis in CHR-P individuals. However, a significant limitation in the studies was the small sample size (93 and 24 CHR-P patients in the first and second studies, respectively). Another noteworthy point is that the Corcoran & Cecchi (54) study used different techniques for eliciting speech across the two cohorts, which could introduce methodological bias due to a lack of standardization.
Overall, evaluating language with the support of digital technologies offers clinicians new opportunities in predicting psychosis (54). This emerging pathway has several advantages, including the potential to use language as a biomarker for psychosis (e.g., its association with neuroanatomic structures such as the temporal lobe) and its putative role in the identification of individuals in CHR-P (54, 55), its feasibility in terms of cost and time (56), and the integration of machine learning which allows inferences at individual level (52). To address the limitations of these studies, particularly small sample sizes, conducting a large-scale multicenter study involving a diverse and heterogeneous population could provide a feasible solution (50).
Most clinical prediction models for people at risk of psychosis currently rely on estimations made at a fixed point in time (as a baseline snapshot). However, the research highlights the dynamic nature of the stage preceding the development of psychosis (57, 58). Addressing this methodological limitation, a recent publication (59) proposed a novel dynamic transdiagnostic model with comprehensive external and internal validation. The study demonstrated significant improvements in risk assessment compared to traditional static models. Specifically, the C-index values revealed enhanced predictive performance: the dynamic model showed a baseline score of 0.90 and a final key time point score of 0.79, compared to the static model's baseline score of 0.87 and final time point score of 0.76. In addition, the predictive features generated by the NLP techniques found paranoia to be the most substantial risk factor (HR = 1.25). The acceptable calibration slope, ranging between 0.97 and 1.1, indicated the precision of the prediction (a slope of 1 reflects perfect alignment, >1 indicates overprediction, and <1 reveals underprediction of outcomes). Finally, the model exhibited superior clinical utility, with a net benefit, particularly pronounced at later time points (³24 months).
In addition to focusing on the dynamic features of early psychosis, there is a growing body of research that examines broader factors, such as air pollution and climate change (60), which impact mental health at the population level. Unlike studies limited to specific groups, such as CHR-P individuals, these approaches aim to extend predictive insights to the general population, potentially facilitating broader applicability.
In the field of neuroimaging, structural and functional magnetic resonance imaging (MRI) seems to offer promising applications for psychosis prediction among CHR-P patients. Structural analyses reveal that volumetric attenuation across several cerebral regions (frontal, temporal, and parietal cortices, anterior cingulate, insular cortex, and cerebellum) exhibits potential predictive value for future psychosis onset. By contrast, CHR-P patients who transitioned to psychosis manifested augmented pituitary dimensions (61). Functional MRI studies have identified altered thalamocortical connectivity in CHR-P patients who later experienced a psychotic episode. Additionally, increased activity has been observed in the course of language processing in the striate, precentral, caudate, and temporal regions, as well as in the hippocampus, brainstem, and frontal regions throughout verbal fluency assessments. Moreover, heightened activation has been reported in the right temporal, left frontal, and left inferior parietal regions throughout verbal memory retrieval. All of these changes have been associated with CHR-P individuals who subsequently progressed to psychosis (38). Nevertheless, there are several challenges that require attention, such as the need to increase the sample size of CHR-P patients, the population's inherent heterogeneity, the variability of findings based on the task performed, and the challenges associated with implementing this approach in clinical settings due to its economic cost (47, 61).
It is possible to predict psychosis using models that incorporate clinical, neurocognitive, and linguistic variables (see Figure 1). These variables are cost-effective and could serve as a valuable tool to assist clinicians in making personalized decisions for individuals at risk of psychosis. The field of psychosis prediction is constantly evolving, with emerging research pointing out new variables such as serum biomarkers and advanced analysis methods such as machine learning (39). Integration of an appropriate set of variables is paramount to maximize predictive accuracy (14). More research, including additional external validation studies, is needed to develop preventive strategies that can be applied at the population level.

redalyc-journal-id: 3720
https://revistas.upch.edu.pe/index.php/RNP/article/view/6251/6067 (pdf)
Corresponding author: Joshep Revilla Zúñiga. E-mail: joshep.revilla.z@upch.pe
