Online platform offers reference gene database for biomolecular research
January 06, 2021
By Maria Fernanda Ziegler | Agência FAPESP – Researchers at the University of Campinas (UNICAMP) in the state of São Paulo, Brazil, have created an open-access online platform offering a database with 2,176 human and 3,277 mouse housekeeping genes. Defined as genes required for the maintenance of basic cellular functions and essential to the existence of all cell types, housekeeping genes are used as a reference in experiments that measure variations in gene expression.
The tool will be highly useful for the scientific community in biology and related fields since almost all biomolecular research from drug development to investigation of diseases must take housekeeping genes into account. Simply put, in order to detect and quantify variations in gene expression in cells due to infection, inflammation or a tumor, for example, it is necessary first of all to find out what does not change. Only then can housekeeping genes be used as a benchmark to measure with precision the variations in the expression of other genes. Alterations in gene expression are important because they influence the secretion of proteins, which in turn influence practically all cellular processes, including the response to pathogens and medications.
“The Atlas came out of a need I perceived during my PhD research,” said Bidossessi Wilfried Hounkpe, first author of a paper on the platform published in Nucleic Acids Research. “I was having trouble choosing the housekeeping genes best suited to the experiments I wanted to perform, and after talking to colleagues I realized the problem wasn’t just mine. On the contrary, it was shared by many researchers.”
Hounkpe conducted a doctoral study of the pathophysiological mechanisms of hypercoagulability in sickle cell disease with a scholarship from FAPESP. The study and creation of the Atlas were part of two Thematic Projects supported by FAPESP, one led by Joyce Maria Annichino-Bizzacchi, and the other by Fernando Ferreira Costa.
According to the researchers, housekeeping genes are essential to the design of a biomolecular study, but although some of them have long been well-known to scientists they vary from one condition to another, and from one cell type to another. “The wrong choice can ruin a study or make it irreproducible,” Hounkpe said.
In the absence of a tool like the Atlas, researchers around the world tend to choose housekeeping genes on the basis of previous research on the same topic. “The Atlas was developed in parallel with the doctoral research. Perception of the problem, construction of the solution, structuring of the database and all the programming were initiatives taken by Hounkpe and Francine Chenou, for whom I acted as joint thesis advisor and who has always been interested in bioinformatics,” said Erich de Paula, a professor of hematology in the Department of Clinical Medicine at UNICAMP’s Medical School.
One of the main examples of an important biomedical tool that requires calibration using specific housekeeping genes is real-time RT-PCR. The COVID-19 pandemic has made the technique famous as the most trusted test to detect viral RNA in nasal swabs, and it is one of the most widely used techniques in molecular biology labs and biotech companies worldwide.
Besides identifying the presence of RNA from SARS-CoV-2, RT-PCR can estimate the amount of RNA expressed via any gene of interest under a range of experimental conditions. However, as the researchers stressed, the fact that housekeeping genes are used as calibrators in this kind of analysis makes them vital to the accuracy of the results, which are formulated with reference to the quantity of RNA from housekeeping genes.
Genes and important molecules
Another technique increasingly used to quantify variations in cellular gene expression is RNA-seq, part of a set of strategies known as next-generation genetic sequencing. Its key advantage is the possibility of measuring the expression of several genes at the same time, producing a transcriptome – the full range of messenger RNA (mRNA) molecules expressed by an organism or tissue.
The growing use of RNA-seq permitted the construction of this database, built by mining more than 11,000 high-quality RNA-seq datasets obtained from public resources of tissue-specific gene expression such as GTEx and ARCHS4, among others.
“We didn’t do loads of experiments to get this information,” De Paula said. “We did bioinformatics. We used a public database with RNA-seq data from a large number of samples involving different cell types. The tool mines databases and identifies expression of these genes in a large number of experiments with different cells. We applied an algorithm that delivers this information in a structured manner via a platform designed for use by researchers.”
The Atlas was generated by mining massive human and mouse RNA-seq datasets, including over 11,000 samples from 52 human non-disease tissues or cells and 14 healthy tissues or cells from mice. The user can view the expression of 2,158 transcripts from 2,176 human housekeeping genes and 3,024 transcripts from 3,277 mouse housekeeping genes, downloading those considered relevant to the experiment concerned.
Besides housekeeping genes, the Atlas offers lists of other factors and conditions (proteins, enzymes, hormones, drugs, and diseases) that may modify transcript expression in experiments. The platform also offers a list of primers (short stretches of nucleotides that can be used to start DNA replication) designed and validated by the researchers to facilitate experiment planning.
“Being based on RNA-seq, the Atlas can also show which gene transcript is best suited to a given experiment,” De Paula said. “It’s a more precise selection because it uses a detailed magnifying glass that not only points to the best references genes but also indicates, within a given gene, the appropriate transcripts for use as a standard in a specific cell type.”
The article “HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets” (doi: 10.1093/nar/gkaa609) by Bidossessi Wilfried Hounkpe, Francine Chenou, Franciele de Lima and Erich Vinicius De Paula can be read at: academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaa609/5871367.
Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.