REMINDERS
2nd Exam on Nov.17
Coverage:
Central Dogma of DNA
Replication
Transcription
Translation
Cell structure and function
Recombinant DNA technology andmolecular biology
Protein analysis
BIOINFORMATICS
BIOINFORMATICS
Study of the structure of biologicalinformation and biological systems
Integrates theories and tools ofmathematics/statistics, computerscience and information technology
Involves the use of hardware andsoftware to study vast amounts ofbiological data
What is Bioinformatics?
the field of science in which biology,computer science, and informationtechnology merge to form a singlediscipline
application of information technologyto the storage, management andanalysis of biological information
facilitated by the use of computers
A picture showing DNA's helical structure
FUNCTIONS
Data Management
Storage
Retrieval
Data Analysis
*Literature/Bibliography, Sequence,Structure, Taxonomy, Expression, etc.
BIOLOGICAL DATABASES
Systematic data storage/retrieval
Maintained on a regular basis
Can contain various types of data(integration)
Sequence
Structure
Other pertinent information
Nucleotides and proteins are mostcommon
DATABASES
a large, organized body of persistent data,usually associated with computerizedsoftware designed to update, query, andretrieve components of the data storedwithin the system
Biological databases consist usually of thenucleic acid sequences of the geneticmaterial of various organisms as well asprotein sequences and structures
A picture showing DNA's helical structure
DATABASES
e.g. nucleotide sequence database typicallycontains information such as
contact name
the input sequence with a description of thetype of molecule
the scientific name of the source organismfrom which it was isolated
additional requirements
easy access to the information
a method for extracting only that informationneeded to answer a specific biological question
A picture showing DNA's helical structure
DATABASES
Sequence
GenBank, European Nucleotide Archive(ENA) and DNA Data Bank of Japan(DDBJ); managed by the InternationalNucleotide Sequence DatabaseCollaboration (INSDC)
UniGene
Saccharomyces Genome Database(SGD)
UniProtKB (UniProtKB/Swiss-Prot orUniProt/TrEMBL)
ExPASy
DATABASES
Structure
Nucleic Acid Database (NDB)
Protein Data Bank (PDB)
Worldwide Protein Data Bank (wwPDB)
ExPASy
DATA MINING
Process by which testable hypothesesare created regarding function/structureof gene/protein of interest throughidentifying similar sequences in “moreestablished” organisms
Tools:
Text-term search
Sequence similarity search
Machine Learning
Studies methods and the design ofcomputer programs based on pastexperience
Why?
New methods are being introduced
Old ones should be improved
“Units” of Information
DNA (genome)
RNA (transcriptome)
Protein (proteome)
What is Being Analyzed?
Sequence
Structure
Interactions
Pathways
Mutations/Evolutions
Why?
Increasing amount of biologicalinformation entails
Organization
Archiving
Global unification/harmonization
More biological discoveries
Functional/Structural similarities
Phylogenetic/Evolutionary patterns
Applications
Medicine
Pharmaceuticals
Biotechnology
Agriculture
STRUCTUREDATABASES
Molecular Data
When you draw a molecule,
You start with atoms
Then proceed with the structure
And the three-dimensional data
What can be stored?
Coordinates
Sequences
Chemical graphs
Atoms and bonds
Databases
Protein Data Bank (PDB)
Molecular Modeling Database (MMDB)
Techniques in theLaboratory
X-ray Crystallography
Nuclear Magnetic Resonance
Formats
PDB
mmCIF
MMDB
Structure Viewers
Cn3D
RasMol
WebMol
Mage
VRML
CAD
Swiss PDB Viewer
Promises of bioinformatics
Promises of bioinformatics
Medicine
Knowledge of protein structure facilitates drugdesign
Understanding of genomic variation allows thetailoring of medical treatment to the individual’sgenetic make-up
Genome analysis allows the targeting ofgenetic diseases
The effect of a disease or of a therapeutic onRNA and protein levels can be elucidated
The same techniques can be applied tobiotechnology, crop and livestock improvement,etc...
Challenges in bioinformatics
Challenges in bioinformatics
Explosion of information
Need for faster, automated analysis to processlarge amounts of data
Need for integration between different types ofinformation (sequences, literature,annotations, protein levels, RNA levels etc…)
Need for “smarter” software to identifyinteresting relationships in very large data sets
Lack of “bioinformaticians”
Software needs to be easier to access, useand understand
Biologists need to learn about the software, itslimitations, and how to interpret its results
SEQUENCEALIGNMENT
Two or More Sequences
Measure similarity
Determine correspondences betweenresidues
Find patterns of conservation
Derive evolutionary relationships
Alignment
Correspondences of nucleotides/aminoacids in two sequences or more areassigned
An assignment of correspondences thatpreserves the order of the residueswithin the sequences is an alignment
Gaps are used to achieve this
Sequence alignment refers to theidentification of residue-residuecorrespondences
Uses
Homology
Similarities
“Ancestry”
Genome annotation
Assigning structure and function togenes
Database queries
For newly-discovered/unknownsequences
Tools
Dot Plots
Diagonal lines of dots showing similaritiesbetween two sequences
Scoring Matrices
Score reflects quality of each possiblealignment; best possible score is identified
Scoring scheme is crucial
PAM (Point Accepted Mutations) andBLOSUM (BLOCKS Substitution Matrix)
Dynamic Programming
Algorithmic technique that reuses previouscomputations
Scoring
Penalties/Scores
Match (e.g. A – A)
Mismatch (e.g. A   C)
Gap (e.g. A   _)
Linear Gap Penalty: Uniform
Affine Gap Penalty: Gap Existence vs. GapExtension
Local vs. Global Alignments
Global Alignment
Similarities between majority of twosequences
Local Alignment
Similarities between specific parts oftwo sequences
Programs
Pairwise Sequence Alignment
BLAST
VAST
FASTA
Multiple Sequence Alignment
MAFFT
Needleman-WunschAlgorithm
Can be used for global and alignments
Maximum-value function
A simple scoring scheme is assumedThree steps
Initialization
Matrix fill (scoring)
Traceback (alignment)