Citation networks are two dimensional depictions of research articles in time and space where the articles can be seen to be connected to articles they previously cite or articles who cite them in later years. One may be able to pick out interesting patterns, not easily observable without the additional component of time, which provides a temporal dimension to the analysis. This way, one could follow how a research field evolves and identify key papers that spur the development of an area.
In this blog and the following posts, we describe simple examples on how analysing various citation networks led to insights regarding a subject area we are working on, MDR-TB (multidrug-resistant tuberculosis). For more details about the data behind the network, refer to this post.
What the shape of a network can reveal
Before going on further, a quick explanation of how citation networks may be analysed is necessary.
To come up with the visualization, the network algorithm calculates and places research articles according to their publication year (bottom to top) and source differentiality (horizontal). Purple nodes denote source articles within the core collection that was downloaded for this analysis, and green nodes are articles that had been cited, which were not originally found within the core collection. Source differentiality essentially means that articles that share many sources together are placed close together in the network. By inference, articles that cite the same sources can be then assumed to be discussing similar areas of research. Hence articles to the far right of the visualization will be focused on a different area of research compared to articles towards the far left of the visualization.
In this example, it was noticed that the shape of the network was slanted diagonally upwards. This suggested that as time progressed, newer articles referenced less frequently older articles that can be found at the bottom left of the network. In other words, the research interest slowly evolved over the years as some research areas became outdated.
Another thing that was clear appears to be the thinning out of an area of the network. This is a very interesting observation as it suggests that this area is lagging behind the rest of the network.
Mapping topics obtained via topic modelling of the data onto the citation network revealed interesting insights on the dynamics of topic evolution as viewed within a citation network.
From this visualization, we can observe that most topics are found across the network over time, which suggests that they continue to be researched upon as the field progresses. Below, topics 5 (Treatment optimization) in orange and 2 (Operational/public health) in blue can be seen clearly to be embedded within the main body of the network. Topic 5 is larger and contains more nodes, as expected based on the results from topic modelling previously. As it is larger, we had initially identified subtopics for topic 5, which is seen by the nodes in lighter shades of orange.
Contrary to the previous two topics, when we visualize topic 1 (Drug-related research), we can see that some articles in this topic is situated in the area that was earlier identified as lagging behind. Again, the different shades of green are denoting subtopics that had earlier been identified.
This situation became more drastic when we visualized topic 3 (Diagnostics). Here, most articles within this topic are situated in the sparse part of the network.
What do some of the articles in this topic discuss about? A quick look through the articles in the area revealed that research areas included diagnostic tests such as MDRTBplus assay, resazurin microtiter assay as well as GeneXpert. Out of the 22 nodes in this topic, 18 were articles that were published after 2010, suggesting that there has been a resurgence of interest in the topic lately.
The findings from this analysis corroborates the results of topic modelling analysis conducted earlier. We could clearly visualize that the topic Treatment optimization is by far the largest and most prominent in the network. On the other hand, the topic identified as Diagnostics does not appear to be the focus of research in the field of MDR-TB. It would be interesting to see if this is also the case in a citation network or topic modelling derived from data regarding tuberculosis research as a whole. Nonetheless these findings mirror the results of an analysis on TB R&D funding, where Diagnostics came in second last in terms of research investment and thus is likely a faithful representation of the general situation within TB research.