Databases of discovery

James Ostell. Databases of discovery. ACM Queue, 3(3):40-48, 2005. [doi]

Abstract

Open-ended database ecosystems promote new discoveries in biotech. Can they help your organization, too? The National Center for Biotechnology Information (NCBI), part of the National Institutes of Health (NIH), is responsible for massive amounts of data. A partial list includes the largest public bibliographic database in biomedicine (PubMed), the U.S. national DNA sequence database (GenBank), an online free full text research article database (PubMed Central), assembly, annotation, and distribution of a reference set of genes, genomes, and chromosomes (RefSeq), online text search and retrieval systems (Entrez), and specialized molecular biology data search engines (BLAST, CDD search, and others). At this writing, NCBI receives about 50 million Web hits per day, at peak rates of about 1,900 hits per second, and about 400,000 BLAST searches per day from about 2.5 million users. The Web site transfers about 0.6 terabytes per day, and people interested in local copies of bulk data FTP about 1.2 terabytes per day.