A Message from SfN President David Van Essen

... Message from the President, continued

Databases and bioinformatics tools dealing with sequence data benefit tremendously from the stereotyped, one-dimensional nature of nucleotide and protein sequences. From an informatics perspective, the nervous system poses a starkly different set of challenges owing to its amazing complexity and diversity at many spatial scales and organizational levels. Understanding the brain entails knowing about thousands of brain structures, billions of constituent neurons, exquisitely complex patterns of connectivity, and sophisticated computations mediated by synaptic inputs and spike trains that in turn rely on intricate molecular signaling cascades. Whether one considers individual synapses, neurons, brain nuclei, or neural circuits, each has some features that are highly stereotyped, whereas other features differ in ways that may be critical for understanding development, plasticity, individual variability, and disease state or progression.

Neuroinformatics is an emerging field that tackles the unique challenges posed by neuroscience data. The overarching objective is to provide neuroscientists with powerful tools for searching, visualizing, and analyzing information about the nervous system and for integrating knowledge from different levels of analysis. Before discussing the current state of neuroinformatics, it is instructive to dream for a moment about a vision that may become reality a decade or more hence.

A PEEK INTO THE FUTURE
Imagine that your computer could accept a wide range of questions about the brain posed in natural language, spoken or typed. You would immediately receive detailed, accurate answers presented in an informative combination of graphical and textual displays. For example, in response to "where are dopamine D2 receptors in the mouse brain?" you would see a 3-D atlas of the mouse brain containing a detailed map of D2 receptor distribution. This atlas would be easy to navigate at scales ranging from the whole brain to the microscopic; it would provide links to specific publications and databases from which the information was extracted; it could be queried for ancillary information (e.g., differences among mouse strains and mutants) and for comparisons with other molecular constituents. Queries about the cellular structure, molecular signature, anatomical connectivity, and/or patterns of neural activity of specific cell types that harbor the D2 receptor would yield informative 3D displays and tabulations.

A different sequence of questions might start with "what parts of the brain are abnormal in individuals with autism?," followed by "what is the function of these regions in normal individuals;" "what genes underlie high risk for autism?"; and "what functions of these genes have been revealed using mouse mutants?" Each query would yield information in an appropriate format that sets the stage for sensible follow-up questions. Such tools would improve efficiency and reliability of searches and would facilitate critical analysis and thinking - by making it easier to compare results, identify discrepancies, find commonalities, etc.

The wish list of desired capabilities to serve students, researchers, and clinicians could go on and on. Those who doubt that such capabilities will come to pass in our lifetimes should consider recent historical examples. In the 1980s, requests to molecular biologists to submit gene sequence data to a database elicited widespread skepticism about the utility and reliability of the endeavor. Now, two decades later, it is easy and fast to compare gene sequences across more than 500 species whose entire genomes are accessible in online databases. Such capabilities have transformed the field of genomics and launched the field of bioinformatics. From this perspective, it is just a matter of time before powerful neuroinformatics capabilities like those of the preceding paragraph become available. An optimist might predict within a decade, but even a skeptic would be hard pressed to rule such advances out in the coming half-century.

A BIT OF HISTORY
Neuroscience databases and neuroinformatics tools began to emerge in the 1990s, largely spurred by the Human Brain Project that was supported by NIH and other federal agencies. By 2003, many neuroinformatics tools and databases had been developed, but awareness of these resources in the general neuroscience community was low. In 2003, the SfN Council, led by President Huda Akil, appointed the Brain Information Group (BIG) task force to consider the information infrastructure needs of neuroscience research. The BIG, chaired by Floyd Bloom, surveyed available neuroinformatics resources and identified more than 70 databases and software tools that were specifically relevant to neuroscientists. The Neuroscience Database Gateway (NDG), with seed funding from NIDA, NINDS, and NIMH, was developed as a common portal to aid neuroscientists in identifying useful resources. Currently, the NDG (http://ndg.sfn.org/) lists 175 separate resources and has received almost 700,000 hits since its inception. Notably useful resources include the Allen Brain Atlas (with expression patterns for ~24,000 genes), the Biomedical Informatics Research Network (BIRN, emphasizing neuroimaging imaging data), and SenseLab (emphasizing ion channels, receptors, olfaction, and computational models).

The BIG task force was succeeded by a standing SfN Neuroinformatics committee, currently chaired by Rob Williams. This committee surveys informatics needs of the neuroscience community and oversees scientific content of the NDG. It serves as an honest broker to facilitate awareness of neuroinformatics resources, promote data sharing, and encourage development of common neuroscience terminologies. Another important initiative is the International Neuroinformatics Coordinating Facility, a 12-nation consortium directed by Jan Bjaalie that will develop guidelines for the generation, use, reuse, and stability of openly accessible neuroscience data and resources.

WHAT'S NEEDED NOW
Many steps must occur before neuroinformatics becomes as useful for neuroscience as bioinformatics is for genomics. Several key needs warrant emphasis.

  • Databases - more, more robust, and more populated. Available databases do not adequately serve the diverse needs of the neuroscience community. In establishing and populating new databases, it makes sense to focus on data types that are relatively tractable (e.g., neuroimaging and microarray data), but also to consider all data types of broad use to the community. Investigators having useful reference datasets (expression data; brain atlases; time-series spike data, etc.) should consider submitting these to appropriate databases.
  • Community buy-in. Obstacles to putting one's own data into a database come in many forms: a proprietary perspective, inertia, lack of suitable databases, and difficulty of annotation and data submission. The first two obstacles can be overcome by demonstrating concrete advantages in terms of greater visibility and citation of one's research by virtue of inclusion in a database: data sharing serves the investigator as well as the community at large. The last two obstacles require major software development and refinement, which in turn requires both skilled personnel and available resources.
  • Federated databases. Neuroscience data are far too complex and diverse to be packaged into a single, comprehensive mega-database. Instead, much effort is going into methods for coordinated mining of data that reside in a 'federation' of databases. An important example is the Neuroscience Information Framework (NIF), a multi-institutional effort led by Dan Gardner at Cornell and funded by the NIH Blueprint (a cooperative effort among neuroscience-related institutes and centers). The NIF is an evolutionary next step that builds upon the Neuroscience Database Gateway and benefits from contributions by many neuroscientists.
  • Terminology - confronting neurobabble. The technical terms and abbreviations used by neuroscientists continue to expand and evolve rapidly. For example, OR111-7, NGFI-A, NAc, DAMGO, and MAGUK (taken from a recent Journal of Neuroscience issue) are not exactly household names for most neuroscientists. Use of jargon, while needed for technical precision, frequently leads to uncertainty and bewilderment. This problem is compounded when computers are asked to extract information from journal articles and databases using terminology that is imprecisely or ambiguously defined. To address this growing problem, the neuroscience community needs to engage with neuroinformatics experts to clarify what terms are in current use, what they mean, and how they relate to older or alternative terminology. These issues are being addressed by the aforementioned NIF, BIRN, and INCF groups and by broader bioinformatics groups.

SYNERGIES BETWEEN DATABASES AND
ONLINE PUBLICATIONS

Databases and online publications are inherently synergistic rather than competitive. Each publication distills an enormous amount of experimental data into a small number of figures and tables accompanied by explanatory text. Much of the underlying data would be suitable for data mining if it were suitably annotated and organized. On the other hand, inadequately annotated data can be useless or even dangerous. An attractive strategy is to include part of the requisite annotation (metadata) in the database itself and to rely on relevant journal articles (methods; figure and table legends) for invaluable explanatory information.

To enhance synergies between online journals and databases, a leadership conference titled "PubMed Plus" will be held at Washington University in St. Louis in June, 2007. This meeting was proposed by the SfN Neuroinformatics Committee and is the main Presidential Initiative during my term. It will bring together 60 invited neuroscientists, informaticians; journal editors and publishers; and representatives of foundations, societies, government institutes, and the library community. The agenda will focus on four major issues:

  • Capturing data in ways that facilitate data mining. How can information be acquired efficiently at the time of manuscript acceptance to facilitate searching journal articles for content and exporting data to databases?
  • Linking databases and joutnal publications. How can synergies between databases and online journals be enhanced using bi-directional links between specific journal articles and specific datasets within databases?
  • Databases and journal supplementary materials - standardization and sustainability. How can access to journal supplementary material be improved? What are best practices for ensuring database stability, sustainability, and ease of citing in journal articles?
  • A common manuscript and peer review system? Would standardization among related journals benefit the manuscript submission process?

THE WAY FORWARD
If neuroinformatics resources and tools fulfill their potential in the coming decade, they will greatly improve the efficiency and accuracy of research and allow a variety of new questions to be addressed. Navigating the SfN annual meeting might benefit from sophisticated neuroinformatics-based itinerary planners. Widespread engagement of the neuroscience community as users will encourage innovation and the development of progressively more powerful neuroinformatics tools.

In preparation for this transition, the Neuroscience Database Gateway (and its successor, the Neuroscience Information Framework) can familiarize you with resources currently available. Another strategy is to improve how you store, organize, annotate, and access data from your own laboratory. The classical lab notebook is no longer adequate when essential data (both primary data and processed data) are stored across many files, folders, and computers. More systematic handling of data will reduce uncertainty about what was done, by whom, to what, and when. This can facilitate manuscript preparation and subsequently depositing data into databases.

We particularly need to engage the next generation of neuroscientists in this undertaking, both as creators and as avid users of neuroinformatics tools. An integrated neuroscience gateway should be the first place that our students turn to in order to learn about a new field or dig into their own in greater depth. This will entail incorporating courses in informatics in neuroscience graduate curricula, and providing other venues for exposure, including at SfN meetings and workshops.

Neuroscientists and neuroinformaticians must work in collaboration to bring neuroinformatics into the mainstream. Federal agencies and private foundations need to recognize the importance of funding to develop and sustain neuroinformatics tools and resources. The Society for Neuroscience, in partnership with NIH and NSF, can continue to serve as honest brokers in efforts to formulate sensible guidelines for data sharing and best practices for communicating information. Altogether, neuroinformatics offers excellent opportunities for neuroscientists to make better use of their data and better use of their time to ponder the fabulous mysteries of the brain and the insights to be gleaned from the staggering amounts of information emerging from neuroscience laboratories around the world.