Relationship between MRI scoring systems and neurodevelopmental outcome at 2 years in infants with Neonatal encephalopathy

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


Introduction
Neonatal encephalopathy (NE) affects around 1 to 9 per 1,000 live births worldwide and is an important cause of neurodevelopmental abnormalities including intellectual impairment, blindness and deafness, and cerebral palsy (1)(2). Therapeutic hypothermia (TH) which involves cooling the infant's core temperature between 33°C to 34°C for 72 hours within the first 6 hours of life is the only proven effective neuroprotective therapy available.
Randomized controlled trials (RCT) of TH have demonstrated reduced risk of mortality and neurodevelopmental disability for infants with moderate to severe NE (3)(4)(5)(6), however the incidence of death or moderate/severe disability still remains high (7).
Early prediction of outcome remains challenging but essential to accurately counsel parents and determine the appropriate level of care. The most accurate 2 year neurological prognosis is predicted based on integration of neuroimaging, (amplitude-integrated) electroencephalography (aEEG) and clinical neurologic examination (8). Magnetic resonance imaging (MRI) is the gold standard in neuroimaging for evaluating the severity, extent and location of brain injury. MRI findings in this population have shown parasagittal watershed (WS) infarcts between intervascular boundary zones with the cortex and subcortical white matter also involved; and injury to basal ganglia, thalami, putamen, hippocampi, brainstem, corticospinal tracts, and sensorimotor cortex (9). Basal ganglia and thalamus injury, and abnormal signal intensity in the posterior limb of the internal capsule (PLIC) have been associated with poor neuromotor outcomes such as dyskinetic cerebral palsy and hemiplegia, impaired cognitive outcome and epilepsy (10)(11)(12)(13)(14)(15), and cortical WS injuries have been related to verbal cognitive outcomes (16) with clinical neurological presentations at birth mild or delayed (17). Despite studies showing a reduction in deep gray matter and cortical injury on J o u r n a l P r e -p r o o f MRI in infants treated with TH, the accuracy of MRI to predict outcome is not altered by TH (18).
Several MRI scoring systems which are used to quantify brain injury severity have been developed to standardize the classification of MRI findings in infants with NE. The Barkovich (10) and NICHD NRN (19) scoring systems use conventional MRI (T1-and T2weighted sequences) to assess patterns of injury involving basal ganglia/thalamus (BGT) and WS areas, with an assessment of PLIC abnormalities also included in the NICHD NRN score. Diffusion weighted imaging (DWI) can show restricted diffusion in ischemic brain regions earlier than conventional MRI (20) and proton magnetic resonance spectroscopy ( 1 H-MRS) has been shown to be the most accurate MR biomarker for predicting outcome (21).
The MRI score described by Weeke et al. includes DWI and 1 H-MRS in describing brain abnormalities of the deep gray matter, white matter, cerebellum, and intracranial haemorrhages. The Barkovich score has been shown to predict adverse outcome at 2 years (22) and the NICHD NRN score predicted death or disability at 18-22 months (19) and 6-7 years (15). Additionally, the Weeke score was found to have high predictive value for outcome at 2 years and at school age (24). The use of more detailed scoring systems such as the Weeke score has been previously shown to increase the ability to assess MRI abnormalities compared with Barkovich and NICHD NRN scores (24), however, it is not known how this relates to outcome. Here, we aimed to assess the relationship between Barkovich Barkovich (10), NICHD NRN (19) and Weeke (23) scoring systems by a single assessor (M.Ní.B) blinded to the infant's clinical course.
Barkovich score: T1-and T2-weighted images were used to score infants using the combined Barkovich BG/W score. The BG/W score grades severity of injury from 0 to 4, with a score of 0 indicating normal MRI; a score of 1 representing abnormal signal intensity in basal ganglia of thalamus, a score of 2 representing abnormal signal in the cortex, a score of 3 representing abnormal signal in both the basal ganglia/thalamus and the cortex, and a score of 4 representing abnormal signal intensity in the entire cortex and basal nuclei (10).
NICHD NRN score: T1-and T2-weighted images were also used for the NICHD NRN score in describing brain injury patterns as: 0, normal; 1A, minimal cerebral lesions; 1B, more whether the injury is focal (score of 1), or extensive (score of 2) and if unilateral (score of 1) or bilateral (score of 2). Intraventricular haemorrhage (IVH), subdural haemorrhage (SDH) and cerebral sinovenous thrombosis (CSVT) were also recorded and included in the additional subscore. The total score (total max score 57) was calculated by adding the 4 J o u r n a l P r e -p r o o f subscores (DGM + WM + cerebellum + additional) with increasing scores reflecting greater involvement of brain injury.

Neurodevelopmental outcome
Neurodevelopmental outcome was assessed using the Bayley-III at 2 years of age administered by a clinical psychologist unaware of the MRI scores (28). Rates of developmental delay were determined using Bayley-III test norms (mean=100, SD=15) with Bayley-III score 70-84 (-1 SD) indicating mild impairment, and <70 (-2 SD) indicating severe impairment.

Statistical Analyses
Infant characteristics were compared using student's t-tests, Mann-Whitney U tests,  2 or Fisher exact test where appropriate. Normality was assessed using Shapiro-Wilk test.

Clinical characteristics
135 infants presented with NE between February 2011 and June 2015. Two died before MRI completion, and a further 43 infants either did not have MRI available (n=35) for further analysis or was performed at more than 14 days of life (n=8). We report findings from 90 J o u r n a l P r e -p r o o f neonatal MR scans and 86 follow-up assessments at 2 years. Combined data for both neonatal MR scans and follow-up assessments were available in 66 subjects.
The grades of encephalopathy according to Sarnat and Sarnat were as follows: 3 (3%) infants were exposed to perinatal asphyxia but with no neurological signs, 13 (15%) had Mild NE, 61 (70%) had Moderate NE, and 10 (11%) had Severe NE. Sarnat grading information was not available for 1 infant. 67 (74%) infants required TH in accordance with the TOBY criteria and clinical seizures developed in 56 (63%) infants.
Bayley-III, which was performed at 2 years, was obtained in 86/135 infants; and of these, 66 (73%) patients underwent Bayley-III and neonatal MRI. 24 patients who underwent neonatal MRI were lost to follow-up; four of which died in the early perinatal period.
Compared with infants lost to follow-up (n=24), patients with complete data (neonatal MRI + follow-up) (n=66) were significantly more likely to have the following features: higher gestational and postmenstrual age, more cases of moderate NE, less cases of severe NE and lower cord base excess values. Neonatal characteristics for the whole sample, baseline and follow up analysis subsamples are described in Table 1.

Relationship between MRI scores and neurodevelopmental outcome
Analysis of infants with complete data (MRI scores and follow-up) revealed a significant correlation between Bayley-III cognitive composite score and Barkovich BGT/WS score (adjusted R 2 = 0.1269, p=0.0102), NICHD NRN score (adjusted R 2 = 0.2236, p=0.0026), and Weeke total score (adjusted R 2 = 0.2645, p<0.0001). There was no association between Bayley-III language composite scores and Barkovich BGT/WS and NICHD NRN scores (p>0.05 in both cases). However, we did find a significant correlation between Bayley-III language composite scores and Weeke total score (adjusted R 2 = 0.1747, p=0.0022).
Lastly, we found a significant association between Bayley-III motor composite score and Barkovich

Discussion
We investigated the association of three MRI scoring systems (Barkovich, NICHD NRN and Weeke scores) with Bayley-III composite scores at 2 years in infants with NE. The inter-rater agreement was good for the Barkovich score, and excellent for NICHD NRN and Weeke scores. We found the Weeke scoring system identified MRI abnormalities with the highest frequency. We also show Barkovich, NICHD NRN and Weeke scoring systems were associated with Bayley-III cognitive and motor composite scores at 2 years. However, only the Weeke score was associated with Bayley-III language composite scores. These results lend support for the use of MRI scoring systems for prognosticating cognitive and motor outcome in infants with NE, and suggest that a detailed assessment of all MRI abnormalities has added predictive value for language outcome.

J o u r n a l P r e -p r o o f
Several MRI studies have identified a range of brain abnormalities that are useful in evaluating brain injury severity. Mild to moderate hypoxic-ischemic injury produces brain lesions in WS areas, parasagittal cortex and subcortical white matter, while severe hypoxicischemic injury causes lesions in the thalamus, putamen, hippocampus, brainstem, corticospinal tracts and sensorimotor cortex (9). Many MRI scores have been developed and validated in this population, with a variable level of these brain abnormalities included (10,19,(22)(23)(29)(30)(31)(32). While the overall objective of MRI scoring systems is to guide interpretation in order to reduce subjectivity, standardize communication and aid reproducibility, there is no clear consensus which system describes the most common and significant abnormalities that are relevant to predicting neurodevelopmental outcome. Two of the most commonly used scoring systems are Barkovich and NICHD NRN which group patterns of injury in BG and WS areas and are quick and easy to use, whereas the MRI score recently developed by Weeke et al., (23) assesses all brain abnormalities observed on MRI is more time intensive. These scoring systems also vary in the sequences used, with conventional sequences used in the Barkovich and NICHD NRN scores, and DWI and 1 H-MRS utilised in the Weeke score. Furthermore, these scores were developed and validated in infants with variable severities of NE and described for application beyond the first week of life for Barkovich and NICHD NRN scores, and within the first week of life for the Weeke score. While we acknowledge that we cannot make a direct fair comparison of the predictive ability of each MRI score, we are not intending to do so, nor are we attempting to promote one scoring system over the other. The aim of this paper was to illustrate the strengths of each scoring system for predicting outcome, and the added advantage of a more detailed approach (23). There is a clinical need to determine the strengths of these MRI scoring systems in predicting outcome, as these insights carry important implications for the development of behavioural interventions in the care of children with NE.

J o u r n a l P r e -p r o o f
A previous study compared the ability of Barkovich, NICHD NRN, and Weeke MRI scores to detect brain abnormalities in infants with NE (24). They showed the Weeke score detected MRI abnormalities with the highest frequency compared with Barkovich and NICHD NRN scores. Similar to Machie et al., (24) we also found the Weeke score detected brain abnormalities with a higher frequency than Barkovich and NICHD NRN scoring systems.
Here, we extend this research to determine how this finding relates to predicting neurodevelopmental outcome.
We found Barkovich and NICHD NRN scores were associated with Bayley-III cognitive and motor composite scores at 2 years. This is consistent with previous studies which have established these MRI scores as good predictors of cognitive and motor outcomes in infants with NE. The Barkovich BG/W score has been associated with poor neuromotor and cognitive outcomes (10), and BGT pattern has been associated with severely impaired motor (13)(14)33) and cognitive outcomes (13) and WS pattern with cognitive impairments (13,16).
Additionally, the NICHD NRN score has been shown to be a marker of moderate or severe disability at 18-22 months (19), and death or IQ<70 at 6-7 years of age (15). However, there are limitations to the use of Barkovich and NICHD NRN scores, as classification of injury by these systems provide only a distinct number of brain injury severity levels, and are subject to inter-rater reliability. While we report the inter-rater reliability of all three MRI scores to be good, we found that the Barkovich score had the lowest inter-reliability of the three scores assessed. The Weeke score which showed excellent inter-rater reliability in our study, assesses all brain abnormalities observed on MRI using a comprehensive point by point basis with distinct descriptions of brain injury. Similar to Barkovich and NICHD-NRN scores, we also found that the Weeke score was associated with Bayley-III cognitive and motor J o u r n a l P r e -p r o o f composite scores at 2 years. This is consistent with the Weeke et al. study which showed the Weeke grey matter subscore had high predictive value for adverse outcome at 2 years (defined as death, GMFCS ≥ II, or Bayley-III <85 for motor or cognitive composite score).
Despite the large number of children with impaired language development after NE (34) there have been less studies attempting to identify biomarkers of impaired language ability.
Interestingly, the Weeke score was the only scoring system associated with Bayley-III language composite scores. The Weeke score includes an assessment of brain regions including the corpus callosum, brainstem, hippocampus, and cerebellum, not accounted for by Barkovich and NICHD NRN scores. The corpus callosum which supports the acquisition of spoken language in healthy infants (35) has been identified in a recent systematic review by Dibble et al. as a particular region of interest with altered diffusion in infants with NE (36). Additionally, microstructural abnormalities of the corpus callosum have been associated with language outcomes in infants born preterm (37) and with NE (38). Lesions of the brainstem have also been reported in NE, and are associated with the most severe outcomes (14) with speech, language and communication problems commonly seen in infants with these types of lesions. In infants with NE, reduced volumes of the hippocampus has been identified (39) and is associated with visuospatial memory at 9-10 years of age (40), and injury to the hippocampus has been associated with neurocognition and memory at schoolage in NE (41). The cerebellum which is also important for language functioning (42) ( is increasingly being identified as a brain region vulnerable to hypoxia in infants with NE (43), and in preterm infants, cerebellar abnormalities have been associated with neurodevelopmental outcomes (44). Reduced structural connectivity involving both the hippocampus and cerebellum has also been previously identified in populations at-risk of neurodevelopmental impairments (45).

J o u r n a l P r e -p r o o f
The Weeke score also assesses IVH, SDH and CSVT as part of the additional subscore.
While we were not able to assess the association of this subscore with outcome due to the limited number of infants presenting with these abnormalities, a previous study found intracranial haemorrhages had no association with outcome in infants with NE treated with TH (46). It will therefore be important for future studies to assess whether the inclusion of this subscore adds additional prognostic information.
We were unable to compare the predictive ability of MRI scores for neurodevelopmental outcome between infants with MRIs performed early (≤7 days) or late (>7 days), because only a limited number of MRIs were performed in the first week of life. It is thought that early MRI indicates the timing of injury and later MRI the extent of the injury (47), but there is no consensus which is more valuable in determining prognosis. Late MRI has been shown to be a good predictor of outcome (18)(19) and a previous study found no difference between early and late MRI scans in predicting outcome (29). However, a meta-analysis by Ouwehand et al. (2020) found MRI injury scoring methods were more predictive during the first week than the second week of life, and a recent study showed better predictive ability of Barkovich and NICHD NRN scores when MRI was performed early (≤7 days) compared with late (>7 days) (48). Moreover, pseudonormalization of DWI can occur after the first week of life, restricting the value of the Weeke score to the first week (49)(50). Further studies are needed to determine the optimal timing of imaging in this population to most accurately quantify brain injury severity using MRI scoring systems and predict outcome.
Finally, it is worth considering the inclusion of resting state functional MRI (rs-fMRI) as another modality which may offer additional predictive utility to assess the impact of brain J o u r n a l P r e -p r o o f injury. rs-fMRI noninvasively measures the temporal correlation of low frequency (<0.1Hz) fluctuations in blood oxygen level-dependent (BOLD) signal across the brain while the subject is at rest. A recent systematic review identified altered functional connectivity in brain regions important for motor and language functioning that were associated with motor and language outcomes in individuals after perinatal brain injury (51). They also identified altered brain network functional connectivity involving a number of resting-state networks that were associated with motor outcomes. Disrupted functional connectivity has been previously shown to predict motor impairment better than structural MRI score (52), therefore, therapeutic approaches aimed at measuring rs-fMRI may also facilitate early identification of infants with NE at risk of neurodevelopmental impairments.
Our study has some limitations. Firstly, its retrospective design, small sample size and lack of follow-up in a number of infants (24 out of 90, 27%). In addition, 37 infants did not have MRI data available, two of whom died. The study cohort was heterogenous that included varying severities of NE. Unfortunately, our sample size was not large enough to assess whether these findings would generalize to infants with varied NE severity and further studies with larger sample sizes are needed to investigate this. Our study did not include identification and classification of cerebral palsy, deafness or blindness and should only be interpreted accordingly. Furthermore, not all infants had MRI scanning performed in the first week of life, 31 out of 90 (34%) infants with baseline MRI data and 23 out of 66 (35%) infants with complete data, had MRI scanning performed in the second week of life.
Neurodevelopmental follow-up was obtained at 2 years of age, however, additional studies are needed to assess the predictive value of these MRI scores at later school ages. Finally, due to the heterogeneity of the study cohort showing various brain injury patterns, the number of infants with specific injury patterns was limited. Limitations aside, our findings J o u r n a l P r e -p r o o f add to the literature by suggesting that (1) Barkovich, NICHD NRN and Weeke scores are associated with Bayley-III cognitive and motor composite scores at 2 years, but that (2) only the more detailed MRI score by Weeke et al. is associated with Bayley-III language composite scores at 2 years. Accordingly, more detailed scoring systems may have added prognostic value for neurodevelopmental outcome in newborns with NE.

Conclusion
This study investigated the association of three different MRI scoring systems (Barkovich, NICHD NRN, Weeke) with neurodevelopmental outcome at 2 years in infants with NE. We confirm the predictive value of existing MRI scoring systems for cognitive and motor outcomes, and show that more detailed scoring systems have added prognostic value for language outcome. Future studies should examine how generalizable these findings are to infants with varied severity levels of NE.