Moore Foundation grantees at Carnegie Mellon University have developed a new search method to help biologists and medical researchers reduce search times for DNA and RNA sequences down from days to a matter of minutes. This method will help researchers investigate basic biological processes and potential targets for cancer.
This new search method uses a new indexing data structure, called sequence bloom trees, to sift through DNA and RNA sequences generated by high-throughput sequencing techniques. Currently, this data is stored by the National Institutes of Health in a database of DNA and RNA sequences totaling three quadrillion base-pairs, making it difficult to search for “short reads” of base-pairs that are 50 to 200 base-pairs each.
Just as an index can speed searches through a book or catalog, the sequence bloom trees index can greatly speed up searches of this bioinformatics database.
Read the full article here.