Artículos
Received: 23 September 2021
Accepted: 25 May 2022
Published: 30 June 2023
DOI: https://doi.org/10.22201/icat.24486736e.2023.21.3.1787
Abstract: Complex networks are in general communities. These communities are especially important. Network communities represent sets of nodes, which are very connected. In this research, we developed a new method to find the community structure in networks. Our method is based on flower pollination algorithm (FPA) which is used in the split-ting process. The splitting of networks in our method maximizes a function of quality called modularity. We provide a general framework for implementing our new method to find community structure in networks. We present the effectiveness of our method by comparison with some known methods on computer-generated and real-world networks.
Keywords: Community detection, networks, flower pollination algorithm, normalized mutual information, modularity.
1. Introduction
Many systems can be represented by network or a graph, which makes them very powerful structure. A network is defined by two sets (Newman, 2010). The first set is node set (node set) and the second is edge set . Nodes share relationships between them. Relationships are represented by edges. In general, the number of nodes is and edges is . Euler’s solution of the seven bridges of Königsberg problem is the first use of networks to represent systems (Hopkins & Wilson, 2004). Today networks are used to illustrate several systems. For instance, in social network, which is an interaction between entities (persons, groups of persons, organizations, web sites, …), can be represented by a network with two sets and . Nodes stand for entities (for example persons) and edges stand for relationships between entities (for examples between persons). Analyzing and understanding a network leads to better understanding the system. Among features that can help to understand the structure of a network, we can find the community structure.
Community structure exists in networks, and it gives more information about the network. For instance, we can understand very well the system, which is represented by a network, by finding its community structure and the relationship between communities. In addition, networks can represent many systems like social networks, electric networks, biological networks, etc. It is vital to develop new methods to find network communities. When we analyze networks by studying relationships between nodes, we can get extra information about networks and systems. In general, nodes in the same community have common properties or insure similar tasks in network. A network has parts that are more densely connected than other parts. In other words, the nodes in these parts share many edges between them. These parts of nodes and edges are called communities (clusters). Finally, many studies have been done around networks and how to find community structure.
Many community structure detection methods have been developed (Fortunato, 2010). According to the type of network, we can find methods for unipartite/bipartite networks, weighted/unweighted networks, and directed/undirected networks. Furthermore, methods can be classified into different classes such as hierarchical methods (merging or splitting), methods that are based on maximization of an objective function. Some methods find disjoint communities, where intersection between communities is empty. However, other methods were designed to find overlapping communities, for instance the method in (Chen et al., 2019), where the intersection between communities is not empty.
In this paper, we address the problem of finding community structure in networks. We present a new method to discover community structure in unweighted and undirected networks. Our method is based on nature-inspired metaheuristics algorithm. We have developed our method based on the pollination process of flowers (Yang, 2012). Our method is a hierarchical one. It is based on the splitting of a given network , which models a system. Splitting step in our method is done by the flower pollination algorithm (FPA) (Yang, 2012) to optimize the function of quality called modularity . The process of splitting will be stopped when the graph has been disconnected, which means that each node of represents a community. Finally, our method builds a dendrogram and finds the most optimal community structure , such as and , (for ).
The paper is organized as follows. The concept of FPA is presented in Section 2. Our approach is detailed in Section 3. Experimental results and discussions are given in Section 4. Finally, Section 5 concludes the paper.
2. FPA Presentation
Flower pollination is an interesting phenomenon in nature. Based on the studying flower pollination process, a new algorithm of optimization was designed by Yang (2012) . The algorithm has been named flower pollination algorithm (FPA). In nature pollination can be abiotic form or biotic form. In general, 90% of flowers have biotic pollination where the pollen is transferred by animals (pollinator) like insects. Biotic pollination by bees for instance can be done over long distances.
FPA has three steps (Yang, 2012) described in the following:
-In the first step, the algorithm initializes its parameters and generates the initial population. The best solution is found also in the first step.
-The second step, flowers in population start doing pollination in d-dimensional search (solution space). Flowers can choose a local or global pollination at every iteration in the search space. The algorithm switch between local pollination and global pollination based on probability . Flowers’ location represents the vector of solutions vector and the value of objective function for every solution estimated. According to the value of objective function the new solution is evaluated and updated at every iteration and the best solution will be improved.
-In the final step, the algorithm stops after some iterations and the best solution will be selected.
FPA can converge very fast and can escape the problem of local minima because it makes the long distances movement based on levy flight (Emary et al., 2019). FPA can be used to solve different optimization problems in various fields such as computer science (cloud computing, data clustering, wireless sensor networks, etc.), bioinformatics, operation research, image processing and engineering (Zhou et al., 2018; Abdel-Basset & Shawky, 2019).
3. A new method to find communities in networks
In this section, we present our hierarchical method to discover community structure in networks. Hierarchical methods can be divisive or agglomerative. Our method is hierarchical divisive method. Network is divided by our method based on the maximization of the function of quality called modularity (Clauset et al., 2004). Our method is designed to find community structure in networks with only a single type of node and undirected, unweighted edge.
We can measure the strength of a community structure by the function of quality called modularity (Clauset et al., 2004). Modularity function is based on the observed edges fraction within communities and the expected edges fraction within the same communities, . Modularity can be estimated for undirected and unweighted graph as:
where is the number of nodes in , is the number of edges in and the community structure is . represents the adjacency matrix of . For any node , is the degree of node and its community. The matrix takes two values or if there is an edge between node and then or if there is not a connection between and . represents the adjacency matrix that corresponds null model. In the null model the probability of an existing edge between nodes and is . Finally, function is given as follows:
Values of are between and . closer to indicates stronger community structures. According to (Clauset et al., 2004), a value above about is a good indicator of significant community structure in a network.
Let be an undirected and unweighted network, where is the set of nodes, is the set of edges. The goal of our community detection method is to partition the network into communities (groups):, where , (for ) and . In addition, our method finds the community structure of the network with the greatest value of modularity . To reach this goal, we used an FPA. Our method splits into two new networks and . Nodes of each new network represent a community. Nodes of represent a community and nodes of represent a community . The splitting is based on FPA to maximize the value of modularity function . Then, and will be split until the network has been disconnected. At the end of our method each node in represents a community. Finally, we get a dendrogram for our method and the community structure will be chosen based on value of modularity or the number of communities.
The general algorithm of our method to find community structure is as follows:

Figure 1 shows the result of splitting Zachary's karate club network (Zachary, 1977) by our method. Numbers from to stand for nodes. Our method gave a community structure with two communities such as:
,which were separated by vertical lines in Figure 1. The community structure that was found by our method on the same network is also represented in Figure 2. In this figure, communities’ nodes have different colors and shapes.


4. Experiments and results
To evaluate our method to find community structures in networks, we have tested it on computer-generated and several real networks (Zackary's karate club (Zachary, 1977), American college football (Girvan & Newman, 2002), dolphins (Lusseau et al., 2003), books about US politics (Krebs, 2021), jazz musicians (Gleiser & Danon, 2003), word adjacencies (Newman, 2006) and Les Misérables (Knuth, 1993). We have compared our method with some well-known methods: fast greedy method (Clauset et al., 2004), label propagation method (Raghavan et al., 2007), and infomap method (Rosvall & Bergstrom, 2008).
The fast greedy method was proposed by Clauset et al. (2004) . It is an improvement of the method of Newman (Newman, 2004). It is based on the greedy optimization of modularity, and it is a hierarchical agglomeration algorithm to detect community structure. The method label propagation was proposed by Raghavan et al. (2007). It is based on label propagation. Initially, every node in the graph is initialized with a unique label and at every step of the method each node takes the label that most of its neighbors currently have. The iterative process converges when labels cannot be changed. Then, nodes having identical labels form a community. Finally, the method that was proposed by Rosvall and Bergstrom (2008), which is known as infomap, uses the concept of random walks and entropy communities to find the community structure in network.
4.1. Normalized mutual Information
The comparison of our method with other methods is based on the normalized mutual information (NMI) function (Danon et al., 2005). The NMI is a powerful function to compare a community structure that was found by methods with the real community structure. The value of NMI is based on defining a confusion matrix , where the rows represent the real communities, and the columns represent the found communities. is the number of nodes in the real community that appears in the found community . For two partitions and , the partition represents the real partition with communities and represents the found partition with communities, The normalized mutual information () is estimated as follows:
NMI values are in the range . Partitions and are identical if .
4.2. Dataset based on computer-generated networks
Our method is evaluated on computer-generated networks benchmark proposed by Lancichinetti et al. 2008. The benchmark parameters are the number of nodes , the exponents and of the degree and community size distribution respectively (both distributions are power laws), the number of average degree , number of communities , and the mixing parameter . Each node shares a fraction of its links with other nodes of its community and a fraction with the other nodes of the network.
Figure 3 shows the variation of the obtained by our method, fast greedy method, label propagation method and infomap method on the benchmark networks, with the parameters: mixing parameter between and , , , , and . The value of obtained by our method is high when changes from to and the same thing with other methods. At this range, nodes share many edges with nodes of its community that makes the community structure clear and easy to find. Methods could group the most nodes in the correct communities when the mixing parameter is in . When is in range, it is difficult for all methods to find the true community structure. At this range nodes share few edges with nodes of its community and many edges with nodes from other communities, which make the community structure unclear and difficult to find. However, our method is still more accurate than the other methods. Our method evaluates the community structure at each step of splitting process and at the end our method selects the best community structure based on modularity value. From Figure 3, we see that our method can discover community structure better than fast greedy, label propagation method and infomap method when is greater than .

Figure 4 illustrates the result of our method on network generated by computer with mixing parameter . Figure 4 shows the different communities that were found by our method. On this network with a mixing parameter , our method found a community structure () with eight communities (). Dendrogram labels stand for nodes. In this example, we have a network with nodes. We mention that the community structure can be found by breaking the dendrogram (Figure 4) at various levels (Abonyi & Feil, 2007). In our case, we have chosen to break the dendrogram at the level which maximizes the modularity function.

4.3. Dataset based on real networks
In this section, we give the simulation results of our method, fast greedy, label propagation and infomap on real networks. We considered some real networks drawn from disparate fields (Zachary 1977), dolphins (Lusseau et al., 2003), football (Girvan & Newman, 2002) and books about US politics (Krebs, 2021), where the community structure is known, which made them suitable to evaluate community detection methods.
Table 1 gives obtained results on networks. In this table, for each network we have estimated the value of modularity function according to equation 1, values (according to equation 3) and we have mentioned the number of communities. As can be seen from Table 1, methods find community structure with different number of communities. According to values, our method can regroup the most nodes in the correct communities on Zachary's karate club, dolphin social network, American college football and books about US politics respectively. The value of modularity by our method on these networks was above 0.3.

Figures 5 and 6 show the community structure that was found by our method on dolphins’ network and Books about US politics network. Each label represents a node and edges stand for the relationship between nodes. The community structure that was found by our method was represented by different shapes and colors. Nodes of the same community are represented by the same color and shape. From these Figures 5 and 6, we can see that nodes in the same community are more connected between them and have a few connections with nodes from other communities.


We evaluated the performance of our method with other different real networks without a known community structure. A brief description of these networks is given below.
-Jazz network is a collaborative network (Gleiser & Danon, 2003), which represents the association between jazz musicians. Jazz musicians are represented by nodes and edge existing between nodes just if two musicians played together. The network has nodes and edges.
-Word adjacencies network represents the adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens (Newman, 2006). It has nodes and edges.
-Les Misérables network is co-appearance network of characters in the novel Les Misérables (Knuth, 1993). The network has nodes and edges.
Table 2 gives results of our method, fast greedy, label propagation and Infomap. The number of communities and the estimated value of modularity were mentioned in Table 2. From Table 2, we can see that our method finds community structures with a high value of modularity. It is difficult to compare methods between them because we do not have a reference (a known community structure).

5. Conclusion and future work
A new hierarchical method to discover the community structure for unweighted and undirected networks was presented in this paper. Our new method was developed based on maximization of function of modularity by FPA. Results obtained on computer-generated networks and real benchmark networks prove the efficiency of our method in terms of finding community structures with high values of modularity and accuracy.
Our method can be tested on large scale networks. We can develop it to find community structure in weighted or directed network. It can be extended to find overlapping communities.
References
Abdel-Basset, M., & Shawky, L. A. (2019). Flower pollination algorithm: a comprehensive review.Artificial Intelligence Review,52, 2533-2557. https://doi.org/10.1007/s10462-018-9624-4
Abonyi, J., & Feil, B. (2007).Cluster analysis for data mining and system identification. Springer Science & Business Media. https://doi.org/10.1007/978-3-7643-7988-9
Chen, J., Liu, M., & Liu, X. (2019). Research on of overlapping community detection algorithm based on tag influence.Cluster Computing,22, 6669-6679. https://doi.org/10.1007/s10586-018-2402-x
Clauset, A., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks.Physical review E,70(6), 066111. https://doi.org/10.1103/PhysRevE.70.066111
Danon, L., Diaz-Guilera, A., Duch, J., & Arenas, A. (2005). Comparing community structure identification.Journal of statistical mechanics: Theory and experiment,2005(09), P09008. https://doi.org/10.1088/1742-5468/2005/09/P09008
Emary, E., Zawbaa, H. M., & Sharawi, M. (2019). Impact of Lèvy flight on modern meta-heuristic optimizers.Applied Soft Computing,75, 775-789. https://doi.org/10.1016/j.asoc.2018.11.033
Fortunato, S. (2010). Community detection in graphs.Physics reports,486(3-5), 75-174. https://doi.org/10.1016/j.physrep.2009.11.002
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks.Proceedings of the national academy of sciences,99(12), 7821-7826. https://doi.org/10.1073/pnas.122653799
Gleiser, P. M., & Danon, L. (2003). Community structure in jazz.Advances in complex systems, 6(04), 565-573. https://doi.org/10.1142/S0219525903001067
Hopkins, B., & Wilson, R. J. (2004). The truth about Königsberg.The College Mathematics Journal,35(3), 198-207. https://doi.org/10.1080/07468342.2004.11922073
Knuth, D. E. (1993). The Stanford GraphBase: a platform for combinatorial computing (Vol. 1). New York: AcM Press.
Krebs, V. (2021). Social & Organizational Network Analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com/divided.html
Lancichinetti, A., Fortunato, S., & Radicchi, F. (2008). Benchmark graphs for testing community detection algorithms.Physical review E,78(4), 046110. https://doi.org/10.1103/PhysRevE.78.046110
Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., & Dawson, S. M. (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: can geographic isolation explain this unique trait?.Behavioral Ecology and Sociobiology,54, 396-405. https://doi.org/10.1007/s00265-003-0651-y
Newman, M. (2010) Networks: An Introduction. Oxford University Press, Oxford. http://dx.doi.org/10.1093/acprof:oso/9780199206650.001.0001
Newman, M. E. (2004). Fast algorithm for detecting community structure in networks.Physical review E,69(6), 066133. https://doi.org/10.1103/PhysRevE.69.066133
Newman, M. E. (2006). Finding community structure in networks using the eigenvectors of matrices.Physical review E,74(3), 036104. https://doi.org/10.1103/PhysRevE.74.036104
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks.Physical review E,76(3), 036106. https://doi.org/10.1103/PhysRevE.76.036106
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure.Proceedings of the national academy of sciences,105(4), 1118-1123. https://doi.org/10.1073/pnas.0706851105
Yang, X. S. (2012). Flower pollination algorithm for global optimization. InUnconventional Computation and Natural Computation: 11th International Conference, UCNC 2012, Orléan, France, September 3-7, 2012. Proceedings 11(pp. 240-249). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-32894-7_27
Zachary, W. W. (1977). An information flow model for conflict and fission in small groups.Journal of anthropological research,33(4), 452-473. https://doi.org/10.1086/jar.33.4.3629752
Zhou, G., Wang, R., & Zhou, Y. (2018). Flower pollination algorithm with runway balance strategy for the aircraft landing scheduling problem.Cluster Computing,21, 1543-1560. https://doi.org/10.1007/s10586-018-2051-0
Author notes
∗Corresponding author. E-mail address:bilal340@gmail.com (Bilal Saoud).
Conflict of interest declaration