Function distribution
Function Distribution
In addition to the species distribution of microbial communities, metagenomic sequencing can also obtain functional information of microbial communities. Similar to species distribution information, there is a lot of functional information data, which also needs to be displayed through appropriate visualization methods to show the functional composition of samples, making it easy to compare between samples/groups. Common visualization methods include stacked charts, abundance heatmaps, GraPhlAn charts, Circos charts, Krona charts, and pie charts.
Stacked Charts and Pie Charts
Stacked charts and pie charts are traditional data visualization tools. Pie charts can only display data for one sample/group, while stacked charts can display data for multiple samples/groups.
Abundance Heatmap
Heatmaps are used to display the distribution of different functions in different samples/groups. By placing different functions and samples/groups in one chart, it is very convenient to compare the differences between functions and samples.
Circos Charts
Circos chart [1] (also called Chord diagram) draws groups and functional compositions on the same circle, and connects the function and groups to display the composition of different functions and the abundance differences of functions in different samples/groups.
FAQ
Q:How to process data in statistical distribution Function?
A:Showing too many elements in a single image makes the image appear crowded and it is difficult to get useful information from it. Based on this, the abundance data Function used for mapping need to be screened:
- Classify to others: Keep the data that meets the filter conditions, and classify the data that does not meet the conditions into others.
- FunctionFiltering: Keep the data that meets the filtering conditions, and discard the data that does not meet the conditions.
Filter criteria:
- The top N relative abundanceFunction
- The relative abundace greater than mFunction
Tips
Function Both 'Classify to others' and' filter 'are based on the parameters set in the analysis scheme. However, due to the reason of graphics display, we have limited the maximum number of Function displayed for some graphics. For details, please check the corresponding page description.
Q:What are the clustering distance and clustering method of the heatmap? How do different clustering distances and clustering methods differ? What are the commonly used clustering distances and clustering methods?
A:The purpose of clustering is to identify a subset of discontinuous objects, that is, clustering is to group data sets. The result of microbial clustering is a hierarchical clustering tree with a nested structure. Most clustering is based on distance. Dr.Tom system provides six clustering distances: euclidean, maximum, manhattan, canberra, binary and correlation. The calculation formulas and differences are as follows:
Methods | Formulas | introduction |
---|---|---|
euclidean | Euclidean distance: The square root of the sum of the squared differences of all objects between groups. | |
maximum | Chebyshev distance: the maximum absolute value of the difference between the coordinates of objects in two groups | |
manhattan | Manhattan distance: Sum of absolute differences. This distance can be used when the group has more data types, such as age, gender, and height | |
canberra | Canberra distance: This distance can be used when the samples are relatively similar | |
binary | Jacquard dissimilarity: a: number present in both samples; b: number of species present in one sample; c: number of species present in the other sample; d: number of species present in neither sample sample | |
correlation | Pearson Correlation: Used in correlation heatmaps |
Reference:
- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
- Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. Academic Press.
- Borg, I. and Groenen, P. (1997) Modern Multidimensional Scaling. Theory and Applications. Springer.
Q:What are the commonly used methods for heatmap clustering? What is the application of each method?
A:The system provides three different methods: connection-based hierarchical clustering, average aggregation clustering and minimum variance clustering.
- Connection-based hierarchical clustering: determine the nearest connection of objects based on the longest or shortest distance between objects in two groups, including
single connection
andfull connection
two types - Average aggregation clustering: According to whether to calculate the weight (whether to calculate the number of objects in the group) and the distance calculation method (average distance: the average distance between the added object and the existing object; centroid distance: the geometric center of the distance), it is divided into
UPGMA
,UPGMC
,WPGMA
,WPGMA
four types - Minimum variance clustering: Based on the least squares linear model, the sum of squares within the group is minimized
Each type of method includes at least two specific methods, as shown in the table below. The method marked with an asterisk indicates that the method is a common method for metagenomics
Type | Method | Feature |
---|---|---|
Connection-based hierarchical clustering | single * | |
complege * | ||
Average aggregation clustering | UPGMA * | Arithmetic average - equal weight |
UPGMC | Arithmetic average - Equal weight | |
WPGMA | Centroid clustering - unequal weights | |
WPGMC | Centroid clustering - unequal weights | |
Minimum variance clustering | ward.D | |
ward.D2 |
Reference
Krzywinski, M., Schein, J., Birol, İ., Connors, J., Gascoyne, R., Horsman, D., Jones, S. J., & Marra, M. A. (2009). Circos: An Information Aesthetic for Comparative Genomics. Genome Research, 19(9), 1639–1645. https://doi.org/10.1101/gr.092759.109 ↩︎