Cardiology Research, ISSN 1923-2829 print, 1923-2837 online, Open Access
Article copyright, the authors; Journal compilation copyright, Cardiol Res and Elmer Press Inc
Journal website https://cr.elmerpub.com

Original Article

Volume 16, Number 5, October 2025, pages 385-393


Discriminative Accuracy of CHA2DS2-VASc Score, and Development of Predictive Accuracy Model Using Machine Learning for Ischemic Stroke Risk in Cardiac Amyloidosis and Atrial Fibrillation

Waqas Ullaha, i, Abhinav Nairb, Eric Warnerc, Salman Zahidd, Mansoor Rahmane, Palwasha Khanf, Indranee Rajapreyarc, Sridhara S. Yaddanapudic, M. Chadi Alraiesg, Said Ashrafh, Jeffery Van Hookh, Yegeny Brailovskyc

aDepartment of Interventional Cardiology, University of Massachusetts, Worcester, MA, USA
bCooper Medical School of Rowan University, Camden, NJ, USA
cThomas Jefferson University Hospitals, Philadelphia, PA, USA
dOregon Health & Science University (OHSU), Portland, Oregon, USA
eHamilton Medical Center, Dalton, GA, USA
fMidland Metropolitan University Hospital, Smethwick, UK
gDetroit Medical Center, Detroit, MI, USA
hAtlanticare Regional Medical Center, Atlantic City, NJ, USA
iCorresponding Author: Waqas Ullah, Department of Interventional Cardiology, University of Massachusetts, Worcester, MA, USA

Manuscript submitted June 8, 2025, accepted September 13, 2025, published online October 10, 2025
Short title: CHA2DS2-VASc for Ischemic Stroke CA and AF
doi: https://doi.org/10.14740/cr2101

Abstract▴Top 

Background: CHA2DS2-VASc score in cardiac amyloidosis (CA) with atrial fibrillation (AF) is believed to underestimate ischemic stroke risk, necessitating a better predictive model.

Methods: Data were obtained from the National Readmission Database (NRD). Outcomes between CA-AF and no-CA-AF were compared using multivariate regression analysis to calculate adjusted odds ratios (aORs). AutoScore, an interpretable machine learning framework, was used to develop a stroke risk prediction model, and its predictive accuracy was evaluated with an area under the curve (AUC) using the receiver operating characteristic analysis.

Results: A total of 11,860,804 (CA-AF 22,687 (0.19%) and no-CA-AF 11,838,117) patients were identified from 2015 to 2019. The adjusted odds of mortality (aOR: 1.41 and 1.29), stroke (aOR: 1.78 and 1.74), non-intracranial hemorrhage (aOR: 2.10 and 1.85), and intracranial hemorrhage (aOR: 14.4 and 4.26) were significantly higher in CA-AF compared with non-CA-AF at both index admission and 30 days, respectively. The CHA2DS2-VASc score had a poor discriminative accuracy for stroke at 30 days in CA-AF (AUC 49%, 95% confidence interval (CI): 47 - 51, P = 0.54). The machine learning autoscore integrative model revealed excellent predictive ability of our newly proposed E-CHADS score (end-stage renal disease (ESRD), congestive heart failure (CHF), hypertension (HTN), cancer, dementia, and diabetes mellitus (DM)) for 30-day risk of ischemic stroke in CA-AF (cutoff of 52 points random forest score) with an AUC of 80% (95% CI: 74 - 86).

Conclusions: CA with AF carries a high risk of ischemic stroke that is not accurately predicted by the CHA2DS2-VASc score. Our proposed model (E-CHADS) identifies three new variables (ESRD, dementia, and cancer) that have higher discriminative accuracy for ischemic stroke in these patients.

Keywords: Cardiac amyloidosis; Atrial fibrillation; Ischemic stroke; CHA2DS2-VASc score; End-stage renal disease; Dementia; Cancer

Introduction▴Top 

Atrial fibrillation (AF) significantly burdens society and the healthcare system. The current prevalence of AF in the USA is about 5.2 million and is estimated to rise to 12.1 million by 2030 [1]. Despite strong efforts by professionals, and the scientific community, AF remains a leading cause of heart failure (HF) related hospitalizations, ischemic stroke, and mortality [1]. It is the most commonly encountered arrhythmia in patients with HF, particularly those with cardiac amyloidosis (CA) (in about 70% of patients) [2]. The association between AF and CA has important clinical implications. AF-related loss of atrial contribution to an already hypertrophied and dysfunctional left ventricle is often poorly tolerated, which not only results in profound clinical deterioration but also leads to recurrent hospitalizations [3]. CA-associated atrial dilatation exponentially increases the risk of intracardiac thrombus, systemic embolism, and ischemic stroke [4, 5].

Given the complex interplay of AF and CA, the traditional risk assessment tool (CHA2DS2-VASc score) poorly estimates the risk of ischemic stroke [6]. Some evidence supports the use of oral anticoagulation (OAC) in CA-AF patients regardless of the CHA2DS2-VASc score and even after restoration of the sinus rhythm [6, 7]. However, there has been no study to assess the accuracy of CHA2DS2-VASc, and to determine a better predictive risk assessment tool for ischemic stroke in patients with CA-AF. Providers are often left to expert consensus and clinical experience to aid in decision-making. The current study aimed to assess the thrombotic and bleeding outcomes of CA-AF in the context of OAC and to evaluate the discriminatory accuracy of the CHA2DS2-VASc score. Using a machine learning algorithm, this study also sought to develop a model that had a better predictive ability for ischemic stroke.

Materials and Methods▴Top 

Data source

The data were obtained from the Nationwide Readmissions Database (NRD), which is part of the Healthcare Cost and Utilization Project. It was established by the federal-state-industry partnership and monitored by the Agency for Healthcare Research and Quality. The NRD is an all-payer database that contains data from more than 18 million discharges from 22 US states each year. The unweighted data of NRD account for 49% and 51% of the total US hospitalizations and population, respectively. NRD contains unique identification codes that can link patients across the same year, allowing us to capture readmission. The data were anonymized and hence exempted from the approval of the Institutional Review Board (IRB). This study did not involve any human or animal subjects, and therefore ethical compliance considerations were not applicable.

Study design and population

Using the International Classification of Diseases-10th Revision-Clinical Modification (ICD-10-CM) codes, all hospitalizations for AF were identified between September 2015 and November 2019 (Supplementary Material 1, cr.elmerpub.com). As the data are annualized, hospitalizations in December of each year were excluded, to enable 30-day outcomes for each index admission. Using the standard ICD-10 code for organ-specific amyloidosis and cardiac failure, patients with CA were identified. All patients with AF were divided into two groups, those with CA and without CA. Each group was stratified based on the risk of ischemic stroke determined by the CHA2DS2-VASc (congestive heart failure (CHF), arterial hypertension (HTN), age ≥ 75 years, diabetes mellitus (DM), stroke, vascular disease, age 65 - 74 years, sex). The CHA2DS2-VASc score was categorized into three groups: low risk (CHA2DS2-VASc = 0 - 1), moderate risk (CHA2DS2-VASc = 2), and high risk (CHA2DS2-VASc ≥ 3).

Study objectives and outcomes

The major objectives and outcomes of our study can be categorized into three major sections: 1) Comparing the index-admission and 30-day adjusted risk of in-hospital mortality, ischemic stroke, non-intracranial hemorrhage (non-ICH), and ICH between patients with CA vs. without CA; stratified by CHA2DS2-VASc score and long-term use of OAC; 2) Assessing the discriminative accuracy of CHA2DS2-VASc score for 30-day ischemic stroke in patients with AF and concomitant diagnosis of CA; 3) Developing a predictive model with a machine learning algorithm for the risk of ischemic stroke in patients with AF and CA.

Statistical analysis

Outcomes of CA-AF vs. non-CA-AF

Categorical variables were summarized as percentages and frequency and compared using the Chi-squared test. The proportion of random missing data was < 1% and hence was excluded. A binomial multivariable logistic regression model was used to estimate adjusted odds ratios (aORs) for mortality, stroke, ICH, and non-ICH. Net results were further stratified based on the CHA2DS2-VASc score and history of prior use of OAC therapy. A P value interaction analysis enabled an evaluation of the impact of potential effect modifiers. The cumulative incidence of major outcomes was assessed using the Kaplan-Meier (KM) curves.

Discriminative accuracy of CHA2DS2-VASc score for ischemic stroke in CA-AF

Next, the performance of the CHA2DS2-VASc risk score and CHA2DS2-VASc categories in predicting the risk of ischemic stroke across CA-AF was determined. An area under the curve (AUC) was calculated using receiver operating characteristic (ROC) analysis, to assess how well the logistic regression model fits the dataset.

Development of machine learning predictive model for ischemic stroke in CA-AF

Finally, a stroke prediction model was created using the set guidelines by TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) [8]. The final 30-day readmission data were divided into three non-overlapping cohorts in a proportion of 70%, 20%, and 10% for training-, validation-, and testing-sets, respectively. The training set was used for model development, while the validation set guided hyperparameter tuning, including optimization of the number of variables and cut-off thresholds for predictor selection. The final model performance was then evaluated in the independent testing set. We did not apply k-fold cross-validation or bootstrapping, as the large dataset and split-sample design allowed sufficient internal validation while maintaining interpretability. The continuous variables in the dataset were transformed into categorical variables with the cut-off values determined by the quantiles of the data points.

A combined regression and machine learning approach was used to create a predictive accuracy model. For the latter, an AutoScore framework was utilized to automate the derivation of point-based risk scores (range 1 - 100) for variables that had a biologically plausible relation with the outcome (ischemic stroke) [9, 10]. A total of 42 variables based on clinical relevance comprising components of the CHA2DS2-VASc score, demographics, and major baseline comorbidities were fed into the model as potential predictors. Random forest (RF), a well-validated machine learning algorithm, was utilized to rank the variables in order of their importance for ischemic stroke prediction [11]. The RF model is a non-parametric method and consists of multiple tree-structured classifiers which evolve to a maximum size by training on a random selection of variables and a bootstrap sample. The RF ranking is based on the virtue of variables being heterogeneous and the absence of nonlinearity, which provides an advantage over LASSO regression methods to allow for the optimal selection of variables for risk prediction.

A multivariable logistic regression model was used to obtain coefficient values for score weighting. Among the variables, the lowest coefficient was selected as our reference to eliminate variables with a negative coefficient value. A parsimony and AUC plots were generated illustrating cumulative RF-ranking scores of variables and model performance, respectively. A variable at the plot’s tail-end signified that adding the variable to the model will not improve the predictive accuracy.

For the optimal predictive model, multiple iterations and fine-tuning enabled a selection of the best cut-off value of the RF-ranking score for each variable and choose the final number of variables in the model. To enhance the interpretability of the model, the RF-ranking score was converted into a probability (as shown by AUC), using weighted logistic regression analysis on the validation set. The prediction performance (accuracy, sensitivity, specificity, negative and positive predictive value (NPV, PPV)) of the model was computed using the cumulative score cut-off values of the included variables against the risk of ischemic stroke. The predictability and performance of the model were assessed using the ROC analysis, and AUC with its 95% confidence interval. A two-sided P < 0.05 was considered statistically significant. Lastly, the testing set that was blinded to the scoring algorithm evaluated the performance of the model. Data characteristics and adjusted estimates of (no ischemic stroke vs. ischemic stroke) based on each predictor in the model were also reported. The model was built using the package “AutoScore”, and “pROC”, and all analyses were performed on R version 4.0.2.

Results▴Top 

Selection of cases

From 2015 to 2019, a total of 11,860,804 weighted samples of AF patients were identified at index admission. Of these, 11,838,117 (99.8%) had no history of CA, while 22,687 (0.2%) had a diagnosis of CA. The flow diagram of the selection of cases is presented here (Supplementary Material 2, cr.elmerpub.com). The ICD-10 codes used to identify cases and characteristics are presented (Supplementary Material 1, cr.elmerpub.com).

Baseline characteristics

The detailed intergroup comparison of demographics and key comorbidities is presented here (Supplementary Material 3, cr.elmerpub.com). In both non-CA (87.1%), and CA (92.1%), the predominant CHA2DS2-VASc designation was high risk, mostly having CHA2DS2-VASc class 4 (27.2% vs. 32.7%, respectively) (Supplementary Materials 4, 5, cr.elmerpub.com). The most common contributors to the CHA2DS2-VASc score were a history of HTN (87.8% vs. 84.1%), followed by age > 75 years (65.5% vs. 68.3%) (Supplementary Material 6, cr.elmerpub.com). Only 35.9% of the total population with CA-AF were on long-term OAC therapy at the index admission. As the CHA2DS2-VASc risk category increased from low to moderate to high, the proportion of anticoagulation use increased from 18.0% to 31.7% to 35.9%, respectively (Supplementary Material 7, cr.elmerpub.com). The latter percentage increased to 37.1% at 30-day readmission (Supplementary Material 8, cr.elmerpub.com). On a yearly trend analysis, the annual index admission rate for patients with CA-AF per total CA-AF in the study increased significantly from 18.4% in 2016 to 30.6% in 2019 (Supplementary Materials 9, 10, cr.elmerpub.com)

Overall outcomes at index admission and 30 days

Patients with CA-AF (vs. non-CA-AF) had significantly higher unadjusted odds of all major complications. Similarly, the adjusted odds of mortality (aOR: 1.41, 95% CI: 1.34 - 1.48 and aOR: 1.29, 95% CI: 1.19 - 1.40), stroke (aOR: 1.78, 95% CI: 1.68 - 1.89 and aOR: 1.74, 95% CI: 1.56 - 1.94), non-ICH major bleeding (aOR: 2.10, 95% CI: 1.97 - 2.24 and aOR: 1.85, 95% CI: 1.67 - 2.17), and ICH (aOR: 14.4, 95% CI: 13.91 - 15.01 and aOR: 4.26, 95% CI: 3.83 - 4.73) remained significantly higher in CA-AF compared with non-CA-AF at both index admission and 30 days, respectively (Figure 1, Supplementary Material 11, cr.elmerpub.com). Among the stroke patients the major cardiovascular conditions are presented here (Supplementary Material 12, cr.elmerpub.com).


Click for large image
Figure 1. Proportion of major events and estimates of outcomes between CA-AF and no-CA-AF at index admission and 30-day readmission. CA: cardiac amyloidosis; AF: atrial fibrillation.

Adjusted index admission and 30-day outcomes stratified by the CHA2DS2-VASc risk and long-term anticoagulation use

The stratified frequency and estimates of outcomes are provided (Supplementary Materials 12, 13, cr.elmerpub.com). In concordance with the overall outcomes, the CA-AF group remained to have a higher risk of mortality, ischemic stroke, non-ICH, and ICH across all CHA2DS2-VASc categories and irrespective of the use of OAC at both index admission and 30 days. Except that, anticoagulation use in medium (aOR: 0.64, 95% CI: 0.29 - 1.41 and aOR: 0.74, 95% CI: 0.27 - 1.99) and high risk (aOR: 1.56, 95% CI: 0.74 - 1.90 and aOR: 1.18, 95% 0.94 - 1.47) patients with CA-AF vs. non-CA-AF had a non-significantly different incidence of ischemic stroke at index admission and 30 days, respectively. OAC did not significantly increase the risk of ICH in CA-AF. Moreover, CA was independently associated with a higher risk of stroke even in the absence of AF (Supplementary Material 14, cr.elmerpub.com). Interaction analysis showed that patients not on OAC had a significantly higher risk of stroke in CA-AF compared with both CA-AF on OAC and those without CA-AF, at index admission as well as 30 days (Supplementary Materials 15-17, cr.elmerpub.com).

Receiver operating characteristics of the CHA2DS2-VASc score and risk category for ischemic stroke

In CA-AF patients, both CHA2DS2-VASc score (AUC = 0.493, 95% CI: 0.470 - 0.516, P = 0.54) and CHA2DS2-VASc risk category (AUC = 0.509, 95% CI: 0.487 - 0.532, P = 0.423) had poor discriminatory accuracy for ischemic stroke (Supplementary Materials 18, 19, cr.elmerpub.com).

Machine learning and predictive model development

Score development

Forty-five variables were used to generate a ranking list in the predictive model. After adjusting for coefficient and fine-tuning the model, a scoring table ranging from 0 to 100 was obtained. Using the autoscore algorithm, the RF-scores were ranked in order of their importance (Supplementary Materials 20, 21, cr.elmerpub.com). A parsimony plot was generated based on the predictive ability of the RF-score (Fig. 2) The top seven variables that achieved the maximum AUC were selected in the final model.


Click for large image
Figure 2. Parsimony plot of the new model’s performance predicting stroke (as indicated by the area under the curve (AUC)) as a function of the model’s complexity (as noted in the number of variables included in the model). The AUC (accuracy of the model) increased with the addition of variables till HTN; no further increase in accuracy could be achieved with the addition of other variables beyond variable 8 (AKI). This indicates that the model’s highest accuracy was attained with the first seven variables. HTN: hypertension; ESRD: end-stage renal disease; DM: diabetes mellitus; AKI: acute kidney injury; PUD: peptic ulcer disease; COPD: chronic obstructive pulmonary disease; PVD: peripheral vascular disease.

A higher score indicated a higher risk of ischemic stroke. HF (23), HTN (19), end-stage renal disease (ESRD, 19), and prior stroke (15) were associated with the highest likelihood of ischemic stroke, followed by dementia (8), DM (4), and cancer (4) (Supplementary Material 22, cr.elmerpub.com). Addition of other variables or components of the CHA2DS2-VASc score (age, female sex, vascular disease) did not increase the predictive ability of the model (Fig. 2) For a specific score cut-off threshold, the probability of a 30-day ischemic stroke (predicted risk), and the accompanying metrics (accuracy, sensitivity, specificity, NPV, and PPV) were recorded in Table 1. For specific probability thresholds, their respective score cutoffs were recorded (Supplementary Material 23, cr.elmerpub.com). At a threshold cut-off > 52, our proposed E-CHADS model performance was the best, with sensitivity of 0.71 (95% CI: 0.54 - 0.88), specificity of 0.77 (95% CI: 0.73 - 0.81) and AUC of 0.80 (95% CI: 0.74 - 0.86) (Fig. 3, Supplementary Material 24, cr.elmerpub.com). The Shapley additive explanations (SHAP) analyses confirmed that HF, ESRD, and HTN were the strongest contributors to ischemic stroke risk, with smaller effects from prior stroke, depression, diabetes, and cancer (Supplementary Material 25, cr.elmerpub.com). Finally, the adjusted odds of having an outcome for each of the potential predictors that were fed in the model are shown here (Supplementary Materials 26, 27, cr.elmerpub.com), which confirm a higher risk of stroke with the seven selected components of the E-CHADS.

Table 1.
Click to view
Table 1. Model Performance at Specified Score Cutoffs and Predicted Risk of Stroke
 


Click for large image
Figure 3. ROC curve for E-CHADS model showing a high AUC at 80% for a cut-off RF score of 52 for identifying 30-day risk of ischemic stroke in patients with CA-AF. AUC: area under the curve; CA: cardiac amyloidosis; AF: atrial fibrillation.

Performance evaluation

The testing data set was utilized to assess the E-CHADS prediction performance. The proposed model achieved similar sensitivity and specificity, with a high discrimination power of AUC 0.79 (95% CI: 0.71 - 0.86).

Discussion▴Top 

The current study is the first to compare outcomes, assess the discriminative power of the CHA2DS2-VASc score, and develop the optimal predictive model (E-CHADS) for ischemic stroke in patients with CA and concomitant AF. The major findings are summarized as follows; 1) Patients with CA-AF had a significantly increased adjusted risk of mortality, ischemic stroke, non-ICH, and ICH at index admission of AF and 30-day follow-up; 2) The higher risk of non-ICH and ICH in CA-AF was independent of the use of OAC and CHA2DS2-VASc risk category; however, the incidence of ischemic stroke attenuated by 1.57 times with the use of anticoagulation, especially in medium- and high-risk patients with CA-AF; 3) CHA2DS2-VASc score had a poor discriminative accuracy for ischemic stroke at 30 days reaching up to only 50% AUC; 4) Our proposed E-CHADS model, by adding ESRD, active cancer, and dementia with the four components of CHA2DS2-VASc score (HF, HTN, DM, prior stroke) exhibited excellent discrimination capability by increasing the predictive accuracy of a stroke to 80% (at a threshold of RF-score > 52).

The short-term prognosis of patients with cardioembolic stroke and atherothrombotic stroke is generally poorer than that of other ischemic stroke subtypes, due to higher risks of early recurrence, severe neurological deficits, and mortality. Specific cardiac disorders associated with a higher risk of stroke include recent myocardial infarction (MI), structural heart disease, and dilated cardiomyopathy. Our study showed that CA was independently associated with a higher risk of in-hospital ischemic stroke [6]. Patients with CA had a 5% risk of stroke even in absence of AF, compared with 2.5% in patients with AF without CA. This could be attributed to a heightened risk of cardiac thrombus formation, endothelial injury, coagulation pathway defect, hyposplenism-induced thrombocytosis, and nephrotic syndrome that are frequently associated with CA [12]. Despite an approximately 50% (statistically significant) reduction in the risk of ischemic stroke with OAC, only 37% of the total patients with CA-AF were on anticoagulation at 30 days of index AF, of whom 92.8% belonged to the high-risk, 6.5% to moderate-risk and only approximately 1% to low-risk category. This not only indicates the efficacy of OAC in these patients but also highlights the fact that the decision to start OAC was probably based on the CHA2DS2-VASc score, which we found to be a very poor predictor of stroke. These findings are supported by a recent study by Vilches et al, which also found a substantially increased risk of systemic embolism in non-anticoagulated patients with AF and CA [7]. Furthermore, that study also demonstrated CHA2DS2-VASc to be a poor predictor of systemic embolism [7]. Our study confirms these findings and thus identifies the unmet need for better awareness and advocacy for OAC therapy in all patients with CA-AF, irrespective of the CHA2DS2-VASc score, to achieve the optimal antithrombotic benefits and reduce the risk of stroke.

The biggest fear around OAC use in the clinical community is the augmented risk of major bleeding in CA. The current study showed a 21% higher risk of ICH in absence of AF, and up to 12-fold increased odds of ICH compared with no-CA-AF (Supplementary Materials 19, 20, cr.elmerpub.com). However, it is important to note that this bleeding might be due to the inherent nature of CA-induced vascular fragility, clotting abnormalities, and concomitant cerebral amyloid angiopathy rather than the use of OAC therapy, as the risk of bleeding in CA-AF remained invariantly high even in patients, not on AC. In the P value interaction analysis, the antithrombotic benefits of anticoagulation seemed to outweigh the potential risk of bleeding, as indicated by a decrease in the risk of ischemic stroke, but no increase in ICH with OAC use. Nonetheless, together, these findings underscore the importance of careful evaluation, judicious use of anticoagulation, individualization of care, and assessment of the bi-risk nature of CA in patients with AF.

Due to the poor predictive ability of the CHA2DS2-VASc score, we developed a predictive model (E-CHADS) for ischemic stroke using a machine learning algorithm coupled with a well-validated RF regression analysis. Four of the E-CHADS variables (HF, HTN, DM, and prior stroke) were components of the CHA2DS2-VASc score, while we identified three new variables: ESRD, dementia, and active cancer. The proposed components of E-CHADS carry a strong biological plausibility in the pathogenesis of stroke. Studies have shown that HF increases the risk of ischemic stroke through the cardioembolic phenomenon, accumulation of traditional risk factors, and higher prevalence of permanent AF [12]. Similarly, HTN and DM have both been shown to induce vasculopathy by endothelial dysfunction, vascular remodeling, and inflammation, leading to an increased risk of stroke [13, 14]. For their part, cancer and ESRD induce a hypercoagulable state through the release of thrombogenic substances such as extracellular vesicles and factor X [15]. The latter also accentuates the activity of clotting factors due to the loss of function of antithrombin factors among other mechanisms [16]. Patients with vascular dementia and prior stroke possibly provide a substrate for future ischemic cerebrovascular events [17]. The point-based scoring structure in our model not only identifies the role of these factors but also ranks their relative importance in the model, which corresponds with clinical intuition. For instance, HF and HTN had the highest weightage on the model compared with dementia, indicating that the former two are the most important variables in estimating the risk of stroke.

Overall, the findings of the current study have important clinical implications. Our study affirms that the CHA2DS2-VASc score in its totality is a poor predictor of stroke in AF patients with concurrent CA. Consistent with the emerging notion, the female sex was found to be not a strong predictor of stroke [18]. Some components of CHA2DS2-VASc (CHF, HTN, DM, and prior stroke) had a strong predictive power when combined with our newly proposed three variables (ESRD, active cancer, and dementia). Given the paucity of evidence on a predictive score for stroke, and the higher prevalence of AF in CA, our proposed model of E-CHADS reinforces the identification of key comorbidities, where simple addition of the RF-score in the model can predict the risk of ischemic stroke. Since it is easily accessible and readily obtainable, physicians could choose a cut-off tailored to their applications based on the likelihood of a stroke and metrics such as sensitivity and specificity. Whether this score helps in the decision-making for anticoagulation therapies requires future randomized studies.

Limitations

Given the retrospective observational study design, there is a possibility of selection bias and residual confounding. For the same reason, we could not establish a causal relationship but could only report temporal associations between CA and outcomes. Despite regression-based adjusted analysis, the impact of unmeasured covariates could not be determined. Because the NRD is a readmission database linked only to inpatient discharge records, we were unable to capture events occurring outside the hospital, determine the causes of death, or account for competing risks of readmission, such as mortality in community or ambulatory settings. Furthermore, the lack of data on disease severity, type of amyloidosis, long-term follow-up data, echocardiographic parameters, functional status, and medication use precluded our ability to perform a more robust stratified analysis or include them in our machine learning model. Although all codes were verified using the standard recommended sources and with reported prior literature, the possibility of inadvertent coding error due to the lack of coding precision could not be entirely excluded. The diagnostic workup of CA is not widely available, so it is plausible that the perceived low incidence of CA, may be due to underdiagnosis or under-coding. We employed RF given its robustness with high-dimensional structured data and compatibility with the AutoScore framework, though other algorithms (e.g., XGBoost, logistic regression, K-nearest neighbors) may also perform well. Stroke is a rare outcome in AF cohorts; to address class imbalance we applied stratified sampling within the AutoScore framework rather than oversampling, thereby preserving real-world prevalence. A further limitation is that calibration metrics (e.g., calibration slope, Brier score) were not assessed, and external validation was not performed. As this study is based on US claims data, applicability to other healthcare systems and populations may be limited. Future work will compare alternative machine learning methods, incorporate calibration analyses, and conduct external and international validation.

Conclusions

CA is independently associated with a high risk of ischemic stroke and ICH, with the former not being predicted by the CHA2DS2-VASc score. Anticoagulation lowers the risk of ischemic stroke without further increasing the incidence of major bleeding. We propose an E-CHADS score, a readily accessible risk prediction tool for ischemic stroke probability estimation in patients with CA-AF. Compared with the CHA2DS2-VASc score, it presents the best performance that can help in the identification of high-risk patients (Fig. 4). Future large-scale prospective randomized studies are needed to validate our findings.


Click for large image
Figure 4. The CHA2DS2-VASc model (on the left) was a poor predictor of stroke in AF patients with comorbid cardiac amyloidosis, with an AUC = 0.50. The new model E-CHADS score (on the right) appreciably improves in predictive accuracy for stroke in AF patients with cardiac amyloidosis, with an AUC = 0.80. Variables in each model’s scoring system are listed with their individual random forest scores; the high cumulative score represents a higher risk of ischemic stroke in CA-AF at 30-day readmission. CHF: congestive heart failure; HTN: hypertension; DM: diabetes mellitus; Vasc: vascular disease; AUC: area under the curve. CA: cardiac amyloidosis; AF: atrial fibrillation.
Supplementary Material▴Top 

Suppl 1. ICD-10 codes used to identify cases, baseline characteristics, and outcomes.

Suppl 2. Flow diagram for selection of cases.

Suppl 3. Baseline characteristics of index patients.

Suppl 4. AF patients stratified by risk of stroke.

Suppl 5. CHADSVasc category by amyloidosis status.

Suppl 6. CHA2DS2-VASc risk score components in CA-AF vs. no-CA-AF patients.

Suppl 7. OAC use by CHADSVasc category and amyloidosis status.

Suppl 8. Patients on anticoagulation therapy.

Suppl 9. Index admission by year.

Suppl 10. Yearly trend analysis.

Suppl 11. Outcomes for CA-AF vs. no-CA-AF for index and readmitted patients.

Suppl 12. Frequency of cardiovascular conditions in patients with stroke.

Suppl 13. Frequency of outcomes by stroke risk.

Suppl 14. Outcomes for CA-AF vs. no-CA-AF stratified by stroke risk and anticoagulation status.

Suppl 15. Risk of ICH and stroke for AF, CA, CA-AF patients stratified by CHADSVasc category.

Suppl 16. Estimated marginal means of ICH and stroke across amyloidosis status for index patients.

Suppl 17. Estimated marginal means of ICH and stroke across amyloidosis status for readmitted patients.

Suppl 18. Interaction analysis.

Suppl 19. ROC curves for CHADSVasc model.

Suppl 20. RF score importance ranking.

Suppl 21. Parsimony plot based on RF score importance ranking.

Suppl 22. AUC for CHADSVASc models.

Suppl 23. Random forest score table for new model.

Suppl 24. ROC curves for the new ECHADS mode.

Suppl 25. SHAP plot.

Suppl 26. New model’s performance at specified risk cutoffs and requisite score.

Suppl 27. Putative predictor variables and odds of stroke in CA-AF vs. no-CA-AF patients.

Acknowledgments

None to declare.

Financial Disclosure

None to declare.

Conflict of Interest

All authors declare no conflict of interest.

Informed Consent

Not applicable.

Author Contributions

Waqas Ullah, MD contributed to the conception of the study and data analysis; Abhinav Nair, MPH, Eric Warner, MD, Salman Zahid, MD, Mansoor Rahman, MD and Palwasha Khan contributed to data collection; Indranee Rajapreyar, MD, Sridhara S. Yaddanapudi, MD, M. Chadi Alraies, MD, Said Ashraf, MD, and Jeffery Van Hook, DO reviewed the manuscript; and Yegeny Brailovsky, MD provided supervision. All authors take responsibility for all aspects of reliability and freedom from bias of the data presented and their discussed interpretation.

Data Availability

The data supporting the findings of this study have been deposited and can be accessed.


References▴Top 
  1. Colilla S, Crow A, Petkun W, Singer DE, Simon T, Liu X. Estimates of current and future incidence and prevalence of atrial fibrillation in the U.S. adult population. Am J Cardiol. 2013;112(8):1142-1147.
    doi pubmed
  2. Mints YY, Doros G, Berk JL, Connors LH, Ruberg FL. Features of atrial fibrillation in wild-type transthyretin cardiac amyloidosis: a systematic review and clinical experience. ESC Heart Fail. 2018;5(5):772-779.
    doi pubmed
  3. Wolf PA, Abbott RD, Kannel WB. Atrial fibrillation as an independent risk factor for stroke: the Framingham Study. Stroke. 1991;22(8):983-988.
    doi pubmed
  4. Feng D, Edwards WD, Oh JK, Chandrasekaran K, Grogan M, Martinez MW, Syed IS, et al. Intracardiac thrombosis and embolism in patients with cardiac amyloidosis. Circulation. 2007;116(21):2420-2426.
    doi pubmed
  5. January CT, Wann LS, Alpert JS, Calkins H, Cigarroa JE, Cleveland JC, Jr., Conti JB, et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines and the Heart Rhythm Society. Circulation. 2014;130(23):2071-2104.
    doi pubmed
  6. Bukhari S, Khan SZ, Bashir Z. Atrial Fibrillation, Thromboembolic Risk, and Anticoagulation in Cardiac Amyloidosis: A Review. J Card Fail. 2023;29(1):76-86.
    doi pubmed
  7. Vilches S, Fontana M, Gonzalez-Lopez E, Mitrani L, Saturi G, Renju M, Griffin JM, et al. Systemic embolism in amyloid transthyretin cardiomyopathy. Eur J Heart Fail. 2022;24(8):1387-1396.
    doi pubmed
  8. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73.
    doi pubmed
  9. Xie F, Chakraborty B, Ong MEH, Goldstein BA, Liu N. AutoScore: a machine learning-based automatic clinical score generator and its application to mortality prediction using electronic health records. JMIR Med Inform. 2020;8(10):e21798.
    doi pubmed
  10. Xie F, Ning Y, Yuan H, Saffari E, Chakraborty B, Liu N. Package ‘AutoScore’: an interpretable machine learning-based automatic clinical score generator, R package version 0.2.0, 2021. Available from https://cran.r-project.org/package=AutoScore.
  11. Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests. Pattern Recognit. 2011;44(2):330-349.
  12. Seol H, Kim JS. Prevalence, mechanisms, and management of ischemic stroke in heart failure patients. Semin Neurol. 2021;41(4):340-347.
    doi pubmed
  13. Cipolla MJ, Liebeskind DS, Chan SL. The importance of comorbidities in ischemic stroke: Impact of hypertension on the cerebral circulation. J Cereb Blood Flow Metab. 2018;38(12):2129-2149.
    doi pubmed
  14. Bradley SA, Spring KJ, Beran RG, Chatzis D, Killingsworth MC, Bhaskar SMM. Role of diabetes in stroke: Recent advances in pathophysiology and clinical management. Diabetes Metab Res Rev. 2022;38(2):e3495.
    doi pubmed
  15. Navi BB, Iadecola C. Ischemic stroke in cancer patients: A review of an underappreciated pathology. Ann Neurol. 2018;83(5):873-883.
    doi pubmed
  16. Casserly LF, Dember LM. Thrombosis in end-stage renal disease. Semin Dial. 2003;16(3):245-256.
    doi pubmed
  17. Kalaria RN, Akinyemi R, Ihara M. Stroke injury, cognitive impairment and vascular dementia. Biochim Biophys Acta. 2016;1862(5):915-925.
    doi pubmed
  18. Noubiap JJ, Feteh VF, Middeldorp ME, Fitzgerald JL, Thomas G, Kleinig T, Lau DH, et al. A meta-analysis of clinical risk factors for stroke in anticoagulant-naive patients with atrial fibrillation. Europace. 2021;23(10):1528-1538.
    doi pubmed


This article is distributed under the terms of the Creative Commons Attribution Non-Commercial 4.0 International License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Cardiology Research is published by Elmer Press Inc.