Human circulating small non-coding RNA signature as a non-invasive biomarker in clinical diagnosis of acute myeloid leukaemia

Background: Acute myeloid leukaemia (AML) is the most common acute leukaemia in adults; AML is highly heterogeneous and involves abnormalities at multiple omics levels. Small non-coding RNAs (sncRNAs) present in body fluids are important regulatory molecules and considered promising non-invasive clinical diagnostic biomarkers for disease. However, the signature of sncRNA profile alteration in AML patient serum and bone marrow supernatant is still under exploration. Methods: We examined data for blood and bone marrow samples from 80 consecutive, newly-diagnosed patients with AML and 12 healthy controls for high throughput small RNA-sequencing. Differentially expressed sncRNAs were analysed to reveal distinct patterns between AML patients and controls. Machine learning methods were used to evaluate the efficiency of specific sncRNAs in discriminating individuals with AML from controls. The altered expression level of individual sncRNAs was evaluated by RT-PCR, Q-PCR, and northern blot. Correlation analysis was employed to assess sncRNA patterns between serum and bone marrow supernatant. Results: We identified over 20 types of sncRNA categories beyond miRNAs in both serum and bone marrow supernatant, with highly coordinated expression patterns between them. Non-classical sncRNAs, including rsRNA (62.86%), ysRNA (14.97%), and tsRNA (4.22%), dominated among serum sncRNAs and showed sensitive alteration patterns in AML patients. According to machine learning-based algorithms, the tsRNA-based signature robustly discriminated subjects with AML from controls and was more reliable than that comprising miRNAs. Our data also showed that serum tsRNAs to be closely associated with AML prognosis, suggesting the potential application of serum tsRNAs as biomarkers to assist in AML diagnosis. Conclusions: We comprehensively characterized the expression pattern of circulating sncRNAs in blood and bone marrow and their alteration signature between healthy controls and AML patients. This study enriches research of sncRNAs in the regulation of AML, and provides insights into the role of sncRNAs in AML.


Ethics committee approval and patient consent
All experiments were performed in accordance with the principles set forth in the World Medical Association Declaration of Helsinki. This study was approved by the Institutional Ethics Committees of Xinqiao Hospital, and informed consent was signed by each participant.

Clinical data and sample collection
In total, 92 clinical samples including 80 preliminary diagnosis patients with de novo AML (AML) and 12 healthy controls (NC) from the Hematology Medical Center, Xinqiao Hospital, were enrolled in this study (Table S1). Peripheral blood samples from 50 AML patients and 12 healthy controls were collected.
The peripheral blood samples and bone marrow samples from the same 30 AML patients were collected. Taken together, 122 samples containing either peripheral blood serum or bone marrow supernatant were prepared for small RNA library construction and high-throughput sequencing. All the subjects involved in this study were of Han Chinese descent. The diagnosis of AML patients was based on morphology, immunophenotyping, cytogenetics, and molecular genetics.

Peripheral blood serum and bone marrow supernatant isolation
5 ml peripheral blood from the median cubital vein was collected from each subject. 5 ml bone marrow was aspirated and collected from each participant under general anaesthesia. All the samples were placed at room temperature for 30 min to promote clotting. After blood clotting, the samples were centrifuged at 2,000 × g at 4 °C for 10 min to separate the serum. The isolated serums were then transferred into new tubes and centrifuged again at the speed of 8,500 × g at 4 °C for 10 min to thoroughly remove cell debris contamination. All the samples were stored at -80 °C until RNA extraction.

Serum RNA extraction
Serum RNA was extracted by using TRIzol LS reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. Briefly, 0.25 ml serum was added to a 1.5 ml tube mixed with 0.75 ml TRIzol LS reagent and swirled. 1 μl cel-miR-39 was then added to the mixtures and vortexed vigorously. The mixtures were incubated on ice for 2 h with occasional vortexing to ensure that the serum was completely cracked. Then, 0.2 ml chloroform was then added to the mixtures, vortexed and incubated at room temperature for 10 min. The samples were centrifuged at 12,000 × g for 15 min at 4 °C. Next, the aqueous phase was collected into a tube filled with an equal volume of isopropanol. Then the mixtures were gently mixed with 1 μl glycogen (Invitrogen, Carlsbad, CA, USA), and refrigerated at −80 °C for at least 30 min to precipitate RNA. The mixture was centrifuged at 12,000 × g for 25 min at 4 °C, and the pellet was washed with 75% cold ethanol. Finally, the RNA pellet was dissolved in RNasefree water after being totally dried and stored at -80 °C for small RNA library construction, northern blot, Q-PCR, and RT-PCR.

Small RNA library construction and high-throughput sequencing
Small RNA library construction and sequencing were performed by BGI

Data processing and annotation for small-noncoding RNA sequencing data
To provide high-quality small noncoding RNA (sncRNA) profiles in healthy controls and AML patients, stringent criteria were adopted for data processing.
Raw sequencing reads were processed using the software SPORTS 1.0 [1].

Detection of differentially expressed sncRNAs in AML
We performed comparative analysis of differentially expressed sncRNAs between healthy controls and AML patients using the R package edgeR (v3.36.0) [4]. Briefly, lowly expressed sncRNAs (counts of exon model per million mapped reads (CPM) < 1 in each individual sample) were removed. Sequencing depth and batch effect of both healthy controls and subtypes of AML patients were removed by the trimmed mean of M-values (TMM) method and dispersion was estimated using the quantile-adjusted conditional maximum likelihood method (qCML) method implemented in the package. Differentially expressed sncRNAs were analysed between healthy controls and AML patients using the exact test function, and multiple testing was corrected using the Benjamini-Hochberg method with medium stringency. Using the parameters we mentioned above, a sncRNA was considered to be significantly differentially expressed between healthy controls and AML patients when the p value was <= 0.05 and the absolute value of log2 fold change was >= 1.

RT-PCR and quantitative RT-PCR
Reverse transcription for validation of sncRNA was performed as previously described [5] Briefly, serum total RNA from each subject was polyadenylated and then converted to cDNA using M-MuLV Reverse Transcriptase Reaction system (NEB, USA) with a unique adaptor. sncRNAs were amplified from cDNA using specific sncRNA primers (Table S2)

Northern Blot
Northern blotting was performed as previously described [6] to verify the expression pattern of tsRNA (tsRNA-Gly CCC ) and ysRNA (ysRNA RNA4 and ysRNA RNY5 ) in PBS of healthy controls. In brief, serum total RNA was electrophoresed by 15% urea-PAGE gel, and it was stained with SYBR Gold for imaging under a UV transilluminator. Then, RNA was transferred to Nytran Super Charged membranes (Roche, Switzerland) using TBE buffer (Invitrogen, Carlsbad, CA, USA) and UV cross-linked. The membrane was pre-hybridized with DIG pre-hybridization (Roche, Basel, Switzerland) and followed by incubation with DIG-labeled oligonucleotide probes (Table S3). After discarding the hybridization solution, the membrane was washed with low stringent buffer, high stringent buffer, and washing buffer in turn. Then, the membrane was blocked at room temperature for 3 h using blocking buffer (Roche, Basel, Switzerland). After that, the membrane was incubated again with DIG antibody (Roche, Basel, Switzerland) diluted in blocking buffer for another 1 h. The membrane was washed with DIG wash buffer, followed by incubation in developing buffer for 10 min. Finally, the membrane was stained with CSPD reagent (Roche, Basel, Switzerland) for 15 min in a dark environment at 37 °C and photographed with Bio-Rad system (California, USA).

Machine learning marker sncRNA discovery and performance evaluation
We experimented with the commonly used and highly efficient machine learning methods (Random Forest (v4.6-14) depicted in R) [7] to develop and evaluate binary classifiers based on several small noncoding RNA datasets. Our strategy for sncRNA signature selection was mainly divided into the following parts. First, differentially expressed miRNAs (242 in total) and tsRNAs (88 in total) (mentioned above) were selected to develop prediction models by using the random forest with 1,000 randomizations. Subsequently, we further optimized a subset of 75-miRNAs and 39-tsRNAs as a diagnostic panel based on evidence of differential analysis and average expression level (differentially expressed in AML, expression level >= 10) for developing prediction models.
To reduce the number of variables and improve model efficiency, we optimized a subset of 20-miRNAs and 19-tsRNAs as a diagnostic panel based on evidence of logistic regression models (differentially expressed in AML, expression level >= 10, and coefficients p-value <= 0.05) for the developing prediction models once again. In this study, we used two datasets involving different participant cohorts: 1) a cohort with 12 healthy individuals and 50 AML patient blood serum samples (62 samples in total, cohort-1, discovery cohort) and 2) a cohort with paired blood serum and bone marrow supernatant samples from 30 AML patients (60 samples in total, cohort-2, validation cohort). We applied the discovery cohort to perform the machine learning and crossvalidation analyses. Briefly, with each iteration of the random forest model, one set of samples (n = 10) was randomly sampled and left out first, and the remaining samples (n = 52) were used as a training dataset for sncRNA modelling. Subsequently, the left-out samples were then used as an internal testing dataset for evaluating the prediction accuracy of sncRNA models. A total of 1,000 iterations were set and performed, and the averaged coefficient was used. In addition, the validation cohort was used as an independent cohort for validation. Additionally, the prediction error rates (out-of-bag errors) and the area under the receiver operating curve (AUC) were calculated to evaluate the performance of tsRNA or miRNA prediction models.

Correlation analysis of sncRNAs in PBS and BMS
To further determine the similarity of sncRNA expression profiles between PBS and BMS, we calculated pairwise correlations between PBS and BMS sncRNA expression profiles derived from individual AML samples. Pearson correlation coefficient (r), p values (p), and counts of expressed sncRNAs (n) were calculated by Pearson's correlation analysis.

Statistical analysis
Statistical significance for the analyses in this study was determined by Student's t test, one-way ANOVA, and two-way ANOVA with Fisher's LSD test.

Data sharing statement
Part of the sncRNA sequencing datasets is available in the Genome Sequence           represent healthy controls, and the inner circles (n = 50) indicate AML patients.
Each circle represents one sample.