Neighbour Joining

Anders Gorm Pedersen

31 Jan 201715:29

Summary

TLDRThe video script discusses the application of distance-based methods in reconstructing genetic trees, focusing on the use of least squares and minimum evolution criteria for optimal tree selection. It highlights the impracticality of exhaustive search due to the exponential growth of possible trees with more leaves, and introduces heuristic and clustering algorithms as alternative approaches. The script provides a detailed explanation of the neighbor-joining method, a popular clustering algorithm, which involves iterative steps to build a tree by combining the nearest nodes based on specific criteria. The process is illustrated with an example using a distance matrix, demonstrating how the algorithm constructs a tree with branch lengths that match observed distances, although it does not guarantee finding the best possible tree.

Takeaways

🌳 The script discusses distance-based methods for reconstructing genetic trees, focusing on finding the optimal tree with the smallest sum of squared differences.
🔍 It highlights the impracticality of exhaustive search for large datasets due to the explosive growth in the number of possible trees with an increase in leaves.
🔄 An alternative to exhaustive search is heuristic search, which uses a hill climbing algorithm to gradually move towards better trees, starting from a random tree and improving upon it.
👉 The script introduces clustering algorithms, specifically neighbor-joining, as a third method for distance-based methods, which is widely used and faster than the other two methods.
📊 The starting point for clustering algorithms is a distance matrix, from which the algorithm iteratively builds the tree by combining the two nearest nodes at each step.
🔬 The neighbor-joining algorithm computes a specific measure 'u' for each leaf to determine which nodes to combine first, based on the smallest pairwise observed distance minus the 'u' values.
🔗 After combining nodes, neighbor-joining uses formulas to calculate the branch lengths connecting the nodes to their new ancestral node, rather than simply splitting the observed distance.
📈 The algorithm iteratively updates the distance matrix by replacing the rows and columns of the combined nodes with a new row and column for the ancestral node, and recalculating distances.
🔚 The process continues until all nodes are combined into a tree, with the algorithm providing a rapid method to obtain an overview of the data set, even though it does not guarantee the best possible tree.
📝 The script provides a detailed step-by-step guide for implementing the neighbor-joining algorithm, which can be followed to construct a tree from a given distance matrix.
📉 While neighbor-joining does not always provide the perfect match between observed and patristic distances, especially for larger datasets, it often yields reasonably good trees for an overview.

Q & A

What are the main methods discussed in the transcript for reconstructing genetic trees?
-The transcript discusses three main methods for reconstructing genetic trees: exhaustive search, heuristic search, and clustering algorithms such as neighbor-joining.
What is the primary challenge with using exhaustive search to find the optimal genetic tree?
-The primary challenge with exhaustive search is that the number of possible trees grows exponentially with the number of leaves, making it impractical for more than a small number of leaves.
How does the heuristic search method differ from exhaustive search?
-The heuristic search method, also known as hill climbing, starts with a random tree and iteratively improves it by choosing the best neighboring tree based on the sum of squared differences, rather than examining all possible trees.
What is the basic premise of clustering algorithms in genetic tree reconstruction?
-Clustering algorithms, such as neighbor-joining, work by iteratively combining the two 'nearest' nodes into a tree, gradually building up the tree structure until all nodes are included.
How does the neighbor-joining algorithm decide which nodes to combine first?
-The neighbor-joining algorithm decides which nodes to combine first by calculating a measure for each pair of nodes based on their pairwise observed distance minus the average distances from each node to all other nodes, and then selecting the pair with the smallest value.
What is the purpose of the value 'u' in the neighbor-joining algorithm?
-The value 'u' is the average distance from a leaf to all other leaves, divided by the number of leaves minus two. It is used to determine which pairs of nodes are 'nearest' and should be combined first in the tree.
How does the neighbor-joining algorithm compute the branch lengths when combining two nodes?
-The algorithm uses specific formulas to compute the branch lengths from the combined ancestral node to each of the original nodes, based on the observed distances and the 'u' values of the nodes being combined.
What is the significance of the patristic distance in the neighbor-joining algorithm?
-The patristic distance is the distance from one node to another in the tree. The neighbor-joining algorithm aims to construct a tree where the patristic distances match the observed distances as closely as possible.
Why might the neighbor-joining algorithm not always provide the best possible tree?
-The neighbor-joining algorithm does not guarantee finding the best possible tree because it is an iterative procedure that builds the tree step by step, which can lead to local optima rather than a global optimum.
What is the advantage of using clustering algorithms like neighbor-joining for reconstructing genetic trees?
-Clustering algorithms like neighbor-joining are advantageous because they are much faster than exhaustive or heuristic searches, making them suitable for large datasets where a rapid overview is needed.

Outlines

plate

Этот раздел доступен только подписчикам платных тарифов. Пожалуйста, перейдите на платный тариф для доступа.

Перейти на платный тариф

Mindmap

plate

Этот раздел доступен только подписчикам платных тарифов. Пожалуйста, перейдите на платный тариф для доступа.

Перейти на платный тариф

Keywords

plate

Этот раздел доступен только подписчикам платных тарифов. Пожалуйста, перейдите на платный тариф для доступа.

Перейти на платный тариф

Highlights

plate

Этот раздел доступен только подписчикам платных тарифов. Пожалуйста, перейдите на платный тариф для доступа.

Перейти на платный тариф

Transcripts

plate

Этот раздел доступен только подписчикам платных тарифов. Пожалуйста, перейдите на платный тариф для доступа.

Перейти на платный тариф

Посмотреть больше похожих видео

Statistical Learning: 7.4 Generalized Additive Models and Local Regression

Melhoramento de Gado de Leite-4

Branch Lengths MADE EASY - The Different Types of Phylogenies

Week 3 Lecture 14 Partial Least Squares

Genetic Programming

1. Phylogenetic analysis of pathogens(lecture - part1) -

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Связанные теги

Genetic TreesNeighbor-JoiningDistance MatrixClustering AlgorithmEvolutionary BiologyData AnalysisOptimality CriteriaHill ClimbingPhylogeneticsBioinformatics

Вам нужно краткое изложение на английском?