On the 26th of October, 2018, I was given the opportunity to give a talk and share my findings on mapping knowledge and discourse at the University of Global Health Equity in Kigali, Rwanda. UGHE is a new university built by Partners in Health, and focuses on global health equity.
Below is an abridged version of the talk. Click on images for enlarged version of the slides. For the full presentation, scroll to the bottom of the page for the link.
The main aim of the talk was to give a broad overview of how mapping could be used to extract insights on complex topics, in this case using tuberculosis and global health, as examples.
Complex challenges
Throughout my journey across medicine, academic research and biotech, I came to the realization that we are ill-equipped to deal with the complex health challenges of today. In particular, this is due to the nature of the health issues we are dealing with and the inability of the knowledge and solutions we create to effectively tackle these issues. These health issues are not only physiological, but also include social and cultural dimensions that need to be considered. Additionally, there are emerging problems such as the rise of lifestyle illnesses as well as climate-change related infections, that we are only beginning to comprehend (Figure 1).
The narrow focus of research inherent in the scientific culture makes it difficult to create solutions that can deal with these health issues in all its complexity. Even if there is an intent on the part of scientific researchers to broaden out their focus, the sheer overflow of information makes it almost impossible to get an overview of knowledge, thus making it far safer and easier to retreat within well-established silos.
To add to the complexity, the whole health and solution landscape are framed by stakeholders that have political and economical interests. These interests drive the process of knowledge and solution creation, deciding upon which problems are pursued and which are ignored (Figure 1).
All of this made me realize that the complex health challenges that we face cannot be solved by simple approaches. We must be able to understand the bigger picture and obtain overviews of the different levels of perspectives within the health problem ecosystem, if we are to create long-lasting solutions.

To search for methodologies to transcend the medical and scientific training I was used to, I went on a journey that led me to diverse fields that allowed the incorporation of multiple perspectives in the problem-solving process. The first was design thinking or human-centered design, which introduced me to a process that incorporated looking at the problem from the perspective of the user. I also learnt that there were many different layers to the design thinking process, that were championed by different schools of thoughts, according to the nature of the problem. For the needs of an analyst like myself, I discovered that design thinking on its own was insufficient to deal with the complexities of knowledge needed to frame the problem in the first place.
This led me to discover the second approach to map knowledge, inspired by data science, an emerging field that is rapidly growing. I therefore decided to explore this further and find out what this approach could bring in the analysis of healthcare topics (Figure 2).

Why map?
Before we embark further, one may ask the question, why should we map?
Mapping allows us to visualize, by providing a way to address intangible phenomena such as knowledge flows and issues in society. Once we visualize, gaps in understanding become clearer, leading to the identification of opportunities that we can take to solve the problem. Also, mapping gives us an overview of the field, which then assists us to direct innovation in a more efficient way without reinventing the wheel and waste resources (which happens often in the research world!) (Figure 3).

Knowledge mapping using articles, patents and ideas is useful for researchers and institutions, whereas discourse mapping, which is about understanding topics of discussion in society is of interest for organizations and policy-makers for decision-making at the strategic level. In both types of mapping, important insights regarding the structure and evolution of a field or topic of discussion could be revealed (Figure 4).

When it comes to mapping knowledge, there exists an established field called scientometrics, which is the study of measuring science, technology and innovation. This field is, in itself, not new, but has been recently modernized by the introduction of tools from data science. The first example by Fonseca et al. shows how collaboration networks for TB in research in Brazil could be mapped and visualized. The paper by Raimbault et al. demonstrates how topic modelling or semantic analysis could reveal the structure of research within the field of synthetic biology (Figure 5).

Mapping discourse is becoming more amenable today due to the internet and rise of social media. The first example shows how twitter analysis of HPV discourse could reveal topics of discussion and sentiments of people regarding the controversy of this vaccine. Similarly, semantic analysis of the term GMO revealed the differences in the framing of the topic depending on the site. Federal government sites reported more positive framing compared to news sites which were more negative in the coverage of GMOs (Figure 6).

There exists a variety of data sources which are increasing over the years, as well as many diverse ways of processing and visualizing the information. In my work, I have been interested in using the same tools and approaches, but with the aim of applying these tools to uncover solution directions when dealing with complex health issues. To do that, I collaborated with a data science startup that is developing new ways of transforming and visualizing data. Their vision is to enable people without a technical background in data science to explore data and share their insights with others (Figure 7).
One of the difficulties data scientists face is in knowing if the algorithms they create actually model reality accurately and provide a ‘true’ picture of the knowledge. They therefore need to work with domain experts that are able to evaluate the output, which has been my role in this collaboration. Before trying out these tools on complex new issues, we began working within a field where I have expertise so that I will be able to analyze and verify the results.

My Explorations
The project revolved around three separate investigations that was based on different data sources. For knowledge mapping, the focus was on multi-drug resistant tuberculosis (MDR-TB). For discourse mapping, twitter analysis of the term ‘tuberculosis’ was used. This was followed by a more general analysis of the term ‘global health’ on twitter (Figure 8).

Knowledge mapping: MDR-TB
We queried the Web of Science Sci-Expanded database using the search term ‘MDR-TB’. We focused on this as the search term ‘Tuberculosis’ was too general and we wanted to experiment with a more manageable dataset.
We obtained data in a spreadsheet, which after reconciling and formatting the data, resulted in 2101 articles. The data was then visualized using different types of algorithms and analysed by me (Figure 9). Due to time constraints, I will only share some examples of the analysis in this talk (citation analysis showing field evolution and insights can be found in these blog posts).

The first visualization depicts a network of institutions within the MDR-TB research field (Figure 10). Colours denote communities that are closely related to each other according to how often they publish together. The size of the nodes depict the influence of the node in the network. In this visual the node size is calculated based on its proximity to other nodes that are also influential (based on the PageRank algorithm). There are other algorithms not shown here but which calculate other parameters like betweenness (how a node connects different parts of the network).
There are many ways of analyzing this network. For brevity, the most obvious thing to investigate are the communities themselves. For example, we can see that the light green cluster is very prominent. This is a community that includes Harvard, Partners in Health, McGill University and the CDC, thus giving us an immediate insight that these institutions collaborate closely within MDR-TB research (Figure 10).

If we look closely, we notice two clusters (circled) that are isolated from the rest of the network. When looking at these clusters closely, it became clear that the bottom left cluster was a network of Turkish institutions which tended to work mostly within their own group (Figure 11). The orange arrow points to the location of Partners in Health within the network.

On the other hand, the top right cluster that is not connected to the rest of the network turned out to be associated with the pharmaceutical company, Johnson and Johnson. This deserves further investigation as the data depicts a different picture on collaboration which runs contrary to the company’s general vision. By analyzing affiliation clusters, we can immediately obtain a general impression of how organizations and companies work with one another within a given field (Figure 12).

The next example was conducted to explore topics within the MDR-TB research field, using topic modelling. This is a text mining approach that uses statistical machine learning techniques to discover hidden semantic structures within a large amount of text. In this case, the structures reveal themselves as topics of research, which I then went in to analyse, verify and label (see topic modelling and labelling for more details) .
Topic modelling on our collection of articles on MDR-TB revealed that the biggest focus of research appeared to be focused on Treatment optimization. This was followed by Drug-related basic research which included basic drug discovery and drug resistance genes. Trailing behind were Operational research and Diagnostics. Lastly, less well-developed topics such as Molecular typing and Immunology specific to MDR-TB were revealed (Figure 13). In particular, articles within the topic Immunology described specific immune responses to drug-resistant strains that were distinct from drug-sensitive disease, thus representing a potential new area for biomarker research.
Importantly, the main findings of topic modelling mirrored a recent WHO report on TB research investment, which showed that Drugs (treatment and basic discovery) received the biggest share of funding, whereas Operational and Diagnostics research have been largely neglected, thus confirming that our findings are faithful to the situation in the real world (Figure 13).

In the next slide, we visualized the growth of publications within the field over time. If we look closely, there appears to be a turning point around the year 2006, after which the number of publications increased visibly. What happened during this time? Upon further investigation, it turns out that 2006 is the year where XDR-TB was described for the first time by the CDC, hence explaining the rapid rise in publications on MDR-TB thereafter (Figure 14).
The round purple circles depict highly cited articles that were published at specific points in time (note the graph is interactive within the program, and details can be obtained by scrolling over the visual).

We can also use heat maps to identify when communities are most active. In this case we visualized affiliation communities for the topic of Diagnostics, and we can see that in 2009, community 101 (c101) was very active, but later c10 became more active within this topic. The numbers correspond to the specific communities described earlier in the affiliation network. c101 is associated with a cluster of Indian universities centred around Birla Institute whereas c10 is centred around Stellenbosch university in South Africa (Figure 15).

Discourse mapping: Tuberculosis
To find out about the discourse within the field of Tuberculosis, we analysed twitter data obtained over a three-month period (Figure 16). The data obtained was visualized in a variety of ways and analysed for insights.

We constructed a network where we were able to visualize communities that were formed surrounding key hashtags. Hashtags may be considered as a proxy of discourse. This networks also allows us to identify users that are most engaged in a topic and if present, identify the links that are frequently used within the community to further identify the prevalent topic of discussion (Figure 17).
In this network, one hashtag appeared to be particularly dominant, that of #unhlmtb, which led me to investigate further.

Below is a close-up view of the community surrounding the hashtag #unhlmtb (Figure 18).

In the fifteen-week time period that the data was collected, 352 tweets were made with this hashtag. There was a peak around week 23 of the year, which corresponded with June 2018. To know more, we identified the tweets with the most retweets during this period (Figure 19).

From these top tweets, we learnt that firstly, an important TB meeting involving world leaders on the 26th of September 2018 will be taking place and secondly, that civil society got together on the 4th of June in preparation for the meeting in September. The sharp rise in retweets associated with week 23 was in fact directly corresponding with this civil society meeting (Figure 20).

Now that we know about the existence of the meeting on the 26th of September, we can probe if other concepts were also being discussed in association with this event. Analyzing a network based on co-occurrence of hashtags allowed us to identify the hashtags that were most associated with this topic (Figure 21).
When we isolated the hashtags with the closest link to #unhlmtb, we identified the hashtags: #newyork, #eweisme and #hlm3. We know that #newyork is related to the location of the meeting, but what could the other two hashtags be about?

Examining the most popular tweets with these hashtags revealed that #eweisme was concerned about children and women, and in this case, those with tuberculosis. On the other hand, #hlm3 described another high-level UN meeting about non-communicable diseases (NCDs). With a bit of research, a full picture emerged from the analysis of hashtags associated with #tuberculosis (Figure 22).
We now know that: There will be a meeting in New York on the 26th of September 2018, where world leaders will get together to discuss strategies to end tuberculosis. This was preceded by a gathering of civil society actors in June in preparation for this meeting. We also learnt about additional perspectives that will be included in the discussion, which will include angles on women and children with TB. Finally, we learnt about the context surrounding the meeting, which will take place in close proximity to another high- level UN meeting on NCDs. Thus September 2018 appears to be an important month where world leaders will get together to discuss issues of public health importance.
Thus, this investigation demonstrates how we could obtain a rapid overview about the discourse within a field, without having spent much time monitoring the developments of the subject. Such an investigation cannot by nature, be assumed to be complete, as we cannot be certain that we were able to detect and identify all discourse due to cut-offs used during visualization. Nonetheless, we may use it to obtain a general overview, and investigate further when something stands out clearly, such as in the case of #unhlmtb.

Discourse mapping: Global health
The last investigation we conducted was a twitter analysis of the term ‘Global health’ in order to identify communities and discourse within the field. Similar to the previous twitter analysis of Tuberculosis, we obtained data over a period of five months and visualized it in a variety of ways (Figure 23).

In this co-hashtag network, we have selected the hashtags with the most number of tweets, associated with the term ‘Global health’ in order to reveal the general topics of discourse. We can see, for example, that there are communities gathered around the hashtags #ebola, #health, #wha71,#hiv, #hsr2018 and #womeningh. In order to decipher some of these hashtags, it was necessary to identify the most popular tweets within these communities.

Zooming in into the central part of the network, we can see that the most popular tweet within the community associated with #health described the necessity of investments in health, by the Global Fund. Another popular tweet within the community #wha71, which turned out to be an acronym for World Health Assembly, discussed the importance of addressing the gender gap in global health. Less prominent, but relevant for this talk, is the hashtag #rwanda, which tells us that Rwanda is leading in Africa when it comes to cancer control (Figure 25). Once again, a quick mapping of hashtags enabled us to obtain a quick overview of the topics of discourse within a field.

A co-user network may reveal communities that frequently mention each other. In this case, the network shows communities (identified by the colors) that often cluster together around the theme of ‘Global health’. We wanted to find out who were the most influential users in this network, based on the number of mentions received, with the rationale that, those who are frequently mentioned may be considered important by the community. The top 10 twitter users who received the most mentions can be seen in Figure 26.

We also experimented with another visual to identify the most vocal users on twitter surrounding the theme ‘Global health’ (Figure 27). This visual shows the total number of tweets (Y-Axis) versus time since joining twitter (X-Axis). The size of the nodes depicts the frequency of mentions. Each node depicts a twitter user account whereas colours represent the community the user is associated with.

Zooming in, we could see that users such as the @WHO, @DrTedros and @LHSTMpress have been around for some time within the twitter network. We could therefore expect these users to have produced a large volume of tweets over time (red circles, Figure 28). On the contrary, new users, such as @paimadhu, @womeninGH and @ughe_org can be observed to be extremely active within a short amount of time, suggesting a vocal presence within the twittersphere (red circles, Figure 28).
From this analysis, we also learn that @LSHTMpress, @LHSTM_alumni, @WHO and @DrTedros are being mentioned frequently (based on node size), which again, is not surprising, given that they are associated with well-established organizations. Nonetheless it is interesting to note that @paimadhu, @womeninGH and @ughe_org are also being regularly mentioned, suggesting that they have acquired a presence within the twitter global health community rapidly despite being new to the twittersphere.

Finally, we have recently experimented with extracting images that were highly associated within specific communities. This is due to the fact that besides text, images may also contain valuable information about the ideas and opinions prevalent within a community of users. By referencing websites, blogs, images or videos, we may be able to obtain an idea of the broader ecosystem of actors that participate in the debate, including those who may not necessarily be on twitter.
In this visualization, I’ve picked a user community that is pertinent to this talk (Figure 29). This community is centred around @ughe_org and depicts all the twitter users that are frequently associated with it.

We scraped the top four most liked images from this community, and displayed it in a tile format (Figure 30). This analysis revealed that in the five-month period, the visual most associated with @ughe_org community was associated with the tweet describing the applications to the Masters in Global Health Delivery program at UGHE (Figure 30). Looking at other communities (data not shown here), we observed that the images used were distinctly different according to the discourse prevalent within the specific community (for example a business-minded community versus a community centred around health equity).

Closing
From my explorations, I found that mapping knowledge and discourse using data science techniques has so far confirmed existing knowledge, extended my horizons by revealing new insights, supported impressions I had that were previously intangible and generated new leads that could be pursued and developed further such as in the case for biomarker research (Figure 31). Nonetheless, I believe that it is important that domain experts are on board in order to validate findings and identify shortcomings in the data. This is particularly important due to the use of cut-offs during visualization that may result in the loss of information in order to gain clarity. Therefore, in order to use these tools to their fullest, it will be important for domain experts to work together closely with data scientists to make sense and structure the information.

Overall, I personally feel that this investigation highlights a greater potential that is still relatively untapped, and which will allow for the understanding of whole healthcare systems thus revealing new directions to effectively deal with the emerging issues of today. I believe that these types of investigations will only become more common and accessible to more people as new sources of data become available together with the creation of tools for their analysis. Targeting such investigations with directed questions will be necessary in order to extract the best possible insights from the information.
It is incredibly exciting to live in an era, whereby it is possible to make sense of complexity with few means and resources. It is likely that in the near future, even a small group of individuals in a resource-poor setting, will be able to be on par with larger organizations with regard to knowledge and intelligence, and subsequently ensure that their voices be heard on the global stage.
Thank you.
Full presentation slides: http://anivation.org/slides/alyahya_ughe_2018.pdf