On the 26th of October, 2018, I was given the opportunity to give a talk and share my findings on mapping knowledge and discourse at the University of Global Health Equity in Kigali, Rwanda. UGHE is a new university built by Partners in Health, and focuses on global health equity.

Below is an abridged version of the talk. Click on images for enlarged version of the slides. For the full presentation, scroll to the bottom of the page for the link.

The main aim of the talk was to give a broad overview of how mapping could be used to extract insights on complex topics, in this case using tuberculosis and global health, as examples.

Complex challenges

Throughout my journey across medicine, academic research and biotech, I came to the realization that we are ill-equipped to deal with the complex health challenges of today. In particular, this is due to the nature of the health issues we are dealing with and the inability of the knowledge and solutions we create to effectively tackle these issues. These health issues are not only physiological, but also include social and cultural dimensions that need to be considered. Additionally, there are emerging problems such as the rise of lifestyle illnesses as well as climate-change related infections, that we are only beginning to comprehend (Figure 1).

The narrow focus of research inherent in the scientific culture makes it difficult to create solutions that can deal with these health issues in all its complexity. Even if there is an intent on the part of scientific researchers to broaden out their focus, the sheer overflow of information makes it almost impossible to get an overview of knowledge, thus making it far safer and easier to retreat within well-established silos.

To add to the complexity, the whole health and solution landscape are framed by stakeholders that have political and economical interests. These interests drive the process of knowledge and solution creation, deciding upon which problems are pursued and which are ignored (Figure 1).

All of this made me realize that the complex health challenges that we face cannot be solved by simple approaches. We must be able to understand the bigger picture and obtain overviews of the different levels of perspectives within the health problem ecosystem, if we are to create long-lasting solutions.

fig_1_border
Figure 1

To search for methodologies to transcend the medical and scientific training I was used to, I went on a journey that led me to diverse fields that allowed the incorporation of multiple perspectives in the problem-solving process. The first was design thinking or human-centered design, which introduced me to a process that incorporated looking at the problem from the perspective of the user. I also learnt that there were many different layers to the design thinking process, that were championed by different schools of thoughts, according to the nature of the problem. For the needs of an analyst like myself, I discovered that design thinking on its own was insufficient to deal with the complexities of knowledge needed to frame the problem in the first place.

This led me to discover the second approach to map knowledge, inspired by data science, an emerging field that is rapidly growing. I therefore decided to explore this further and find out what this approach could bring in the analysis of healthcare topics (Figure 2).

fig_2
Figure 2

Why map?

Before we embark further, one may ask the question, why should we map?

Mapping allows us to visualize, by providing a way to address intangible phenomena such as knowledge flows and issues in society. Once we visualize, gaps in understanding become clearer, leading to the identification of opportunities that we can take to solve the problem. Also, mapping gives us an overview of the field, which then assists us to direct innovation in a more efficient way without reinventing the wheel and waste resources (which happens often in the research world!) (Figure 3).

fig_3
Figure 3

Knowledge mapping using articles, patents and ideas is useful for researchers and institutions, whereas discourse mapping, which is about understanding topics of discussion in society is of interest for organizations and policy-makers for decision-making at the strategic level. In both types of mapping, important insights regarding the structure and evolution of a field or topic of discussion could be revealed (Figure 4).

fig_4
Figure 4

When it comes to mapping knowledge, there exists an established field called scientometrics, which is the study of measuring science, technology and innovation. This field is, in itself, not new, but has been recently modernized by the introduction of tools from data science. The first example by Fonseca et al. shows how collaboration networks for TB in research in Brazil could be mapped and visualized. The paper by Raimbault et al. demonstrates how topic modelling or semantic analysis could reveal the structure of research within the field of synthetic biology (Figure 5).

fig_5
Figure 5

Mapping discourse is becoming more amenable today due to the internet and rise of social media. The first example shows how twitter analysis of HPV discourse could reveal topics of discussion and sentiments of people regarding the controversy of this vaccine. Similarly, semantic analysis of the term GMO revealed the differences in the framing of the topic depending on the site. Federal government sites reported more positive framing compared to news sites which were more negative in the coverage of GMOs (Figure 6).

fig_6
Figure 6

There exists a variety of data sources which are increasing over the years, as well as many diverse ways of processing and visualizing the information. In my work, I have been interested in using the same tools and approaches, but with the aim of applying these tools to uncover solution directions when dealing with complex health issues. To do that, I collaborated with a data science startup that is developing new ways of transforming and visualizing data. Their vision is to enable people without a technical background in data science to explore data and share their insights with others (Figure 7).

One of the difficulties data scientists face is in knowing if the algorithms they create actually model reality accurately and provide a ‘true’ picture of the knowledge. They therefore need to work with domain experts that are able to evaluate the output, which has been my role in this collaboration. Before trying out these tools on complex new issues, we began working within a field where I have expertise so that I will be able to analyze and verify the results.

fig_7
Figure 7

My Explorations

The project revolved around three separate investigations that was based on different data sources. For knowledge mapping, the focus was on multi-drug resistant tuberculosis (MDR-TB). For discourse mapping, twitter analysis of the term ‘tuberculosis’ was used. This was followed by a more general analysis of the term ‘global health’ on twitter (Figure 8).

fig_8
Figure 8

Knowledge mapping: MDR-TB

We queried the Web of Science Sci-Expanded database using the search term ‘MDR-TB’. We focused on this as the search term ‘Tuberculosis’ was too general and we wanted to experiment with a more manageable dataset.

We obtained data in a spreadsheet, which after reconciling and formatting the data, resulted in 2101 articles. The data was then visualized using different types of algorithms and analysed by me (Figure 9). Due to time constraints, I will only share some examples of the analysis in this talk (citation analysis showing field evolution and insights can be found in these blog posts).

fig_9
Figure 9

The first visualization depicts a network of institutions within the MDR-TB research field (Figure 10). Colours denote communities that are closely related to each other according to how often they publish together. The size of the nodes depict the influence of the node in the network. In this visual the node size is calculated based on its proximity to other nodes that are also influential (based on the PageRank algorithm). There are other algorithms not shown here but which calculate other parameters like betweenness (how a node connects different parts of the network).

There are many ways of analyzing this network. For brevity, the most obvious thing to investigate are the communities themselves. For example, we can see that the light green cluster is very prominent. This is a community that includes Harvard, Partners in Health, McGill University and the CDC, thus giving us an immediate insight that these institutions collaborate closely within MDR-TB research (Figure 10).

fig_10_border
Figure 10 (Click here for higher resolution)

If we look closely, we notice two clusters (circled) that are isolated from the rest of the network. When looking at these clusters closely, it became clear that the bottom left cluster was a network of Turkish institutions which tended to work mostly within their own group (Figure 11). The orange arrow points to the location of Partners in Health within the network.

fig_11_border
Figure 11

On the other hand, the top right cluster that is not connected to the rest of the network turned out to be associated with the pharmaceutical company, Johnson and Johnson. This deserves further investigation as the data depicts a different picture on collaboration which runs contrary to the company’s general vision. By analyzing affiliation clusters, we can immediately obtain a general impression of how organizations and companies work with one another within a given field (Figure 12).

fig_12_border
Figure 12

The next example was conducted to explore topics within the MDR-TB research field, using topic modelling. This is a text mining approach that uses statistical machine learning techniques to discover hidden semantic structures within a large amount of text. In this case, the structures reveal themselves as topics of research, which I then went in to analyse, verify and label (see topic modelling and labelling for more details) .

Topic modelling on our collection of articles on MDR-TB revealed that the biggest focus of research appeared to be focused on Treatment optimization. This was followed by Drug-related basic research which included basic drug discovery and drug resistance genes. Trailing behind were Operational research and Diagnostics. Lastly, less well-developed topics such as Molecular typing and Immunology specific to MDR-TB were revealed (Figure 13). In particular, articles within the topic Immunology described specific immune responses to drug-resistant strains that were distinct from drug-sensitive disease, thus representing a potential new area for biomarker research.

Importantly, the main findings of topic modelling mirrored a recent WHO report on TB research investment, which showed that Drugs (treatment and basic discovery) received the biggest share of funding, whereas Operational and Diagnostics research have been largely neglected, thus confirming that our findings are faithful to the situation in the real world (Figure 13).

fig_13_border
Figure 13

In the next slide, we visualized the growth of publications within the field over time. If we look closely, there appears to be a turning point around the year 2006, after which the number of publications increased visibly. What happened during this time? Upon further investigation, it turns out that 2006 is the year where XDR-TB was described for the first time by the CDC, hence explaining the rapid rise in publications on MDR-TB thereafter (Figure 14).

The round purple circles depict highly cited articles that were published at specific points in time (note the graph is interactive within the program, and details can be obtained by scrolling over the visual).

fig_14
Figure 14

We can also use heat maps to identify when communities are most active. In this case we visualized affiliation communities for the topic of Diagnostics, and we can see that in 2009, community 101 (c101) was very active, but later c10 became more active within this topic. The numbers correspond to the specific communities described earlier in the affiliation network. c101 is associated with a cluster of Indian universities centred around Birla Institute whereas c10 is centred around Stellenbosch university in South Africa (Figure 15).

fig_15
Figure 15

Discourse mapping: Tuberculosis

To find out about the discourse within the field of Tuberculosis, we analysed twitter data obtained over a three-month period (Figure 16). The data obtained was visualized in a variety of ways and analysed for insights.

fig_16
Figure 16

We constructed a network where we were able to visualize communities that were formed surrounding key hashtags. Hashtags may be considered as a proxy of discourse. This networks also allows us to identify users that are most engaged in a topic and if present, identify the links that are frequently used within the community to further identify the prevalent topic of discussion (Figure 17).

In this network, one hashtag appeared to be particularly dominant, that of #unhlmtb, which led me to investigate further.

fig_17_border
Figure 17

Below is a close-up view of the community surrounding the hashtag #unhlmtb (Figure 18).

fig_18_border
Figure 18

In the fifteen-week time period that the data was collected, 352 tweets were made with this hashtag. There was a peak around week 23 of the year, which corresponded with June 2018. To know more, we identified the tweets with the most retweets during this period (Figure 19).

fig_19
Figure 19

From these top tweets, we learnt that firstly, an important TB meeting involving world leaders on the 26th of September 2018 will be taking place and secondly, that civil society got together on the 4th of June in preparation for the meeting in September. The sharp rise in retweets associated with week 23 was in fact directly corresponding with this civil society meeting (Figure 20).

fig_20
Figure 20

Now that we know about the existence of the meeting on the 26th of September, we can probe if other concepts were also being discussed in association with this event. Analyzing a network based on co-occurrence of hashtags allowed us to identify the hashtags that were most associated with this topic (Figure 21).

When we isolated the hashtags with the closest link to #unhlmtb, we identified the hashtags: #newyork, #eweisme and #hlm3. We know that #newyork is related to the location of the meeting, but what could the other two hashtags be about?

fig_21_border
Figure 21

Examining the most popular tweets with these hashtags revealed that #eweisme was concerned about children and women, and in this case, those with tuberculosis. On the other hand, #hlm3 described another high-level UN meeting about non-communicable diseases (NCDs). With a bit of research, a full picture emerged from the analysis of hashtags associated with #tuberculosis (Figure 22).

We now know that: There will be a meeting in New York on the 26th of September 2018, where world leaders will get together to discuss strategies to end tuberculosis. This was preceded by a gathering of civil society actors in June in preparation for this meeting. We also learnt about additional perspectives that will be included in the discussion, which will include angles on women and children with TB. Finally, we learnt about the context surrounding the meeting, which will take place in close proximity to another high- level UN meeting on NCDs. Thus September 2018 appears to be an important month where world leaders will get together to discuss issues of public health importance.

Thus, this investigation demonstrates how we could obtain a rapid overview about the discourse within a field, without having spent much time monitoring the developments of the subject. Such an investigation cannot by nature, be assumed to be complete, as we cannot be certain that we were able to detect and identify all discourse due to cut-offs used during visualization. Nonetheless, we may use it to obtain a general overview, and investigate further when something stands out clearly, such as in the case of #unhlmtb.

fig_22
Figure 22

Discourse mapping: Global health

The last investigation we conducted was a twitter analysis of the term ‘Global health’ in order to identify communities and discourse within the field. Similar to the previous twitter analysis of Tuberculosis, we obtained data over a period of five months and visualized it in a variety of ways (Figure 23).

fig_23
Figure 23

In this co-hashtag network, we have selected the hashtags with the most number of tweets, associated with the term ‘Global health’ in order to reveal the general topics of discourse. We can see, for example, that there are communities gathered around the hashtags #ebola, #health, #wha71,#hiv, #hsr2018 and #womeningh. In order to decipher some of these hashtags, it was necessary to identify the most popular tweets within these communities.

fig_24_border
Figure 24 (Click here for higher resolution network)

Zooming in into the central part of the network, we can see that the most popular tweet within the community associated with #health described the necessity of investments in health, by the Global Fund. Another popular tweet within the community #wha71, which turned out to be an acronym for World Health Assembly, discussed the importance of addressing the gender gap in global health. Less prominent, but relevant for this talk, is the hashtag #rwanda, which tells us that Rwanda is leading in Africa when it comes to cancer control (Figure 25). Once again, a quick mapping of hashtags enabled us to obtain a quick overview of the topics of discourse within a field.

fig_25_border
Figure 25

A co-user network may reveal communities that frequently mention each other. In this case, the network shows communities (identified by the colors) that often cluster together around the theme of ‘Global health’. We wanted to find out who were the most influential users in this network, based on the number of mentions received, with the rationale that, those who are frequently mentioned may be considered important by the community. The top 10 twitter users who received the most mentions can be seen in Figure 26.

fig_26_border
Figure 26 (Click here for higher resolution)

We also experimented with another visual to identify the most vocal users on twitter surrounding the theme ‘Global health’ (Figure 27). This visual shows the total number of tweets (Y-Axis) versus time since joining twitter (X-Axis). The size of the nodes depicts the frequency of mentions. Each node depicts a twitter user account whereas colours represent the community the user is associated with.

fig_27_border
Figure 27

Zooming in, we could see that users such as the @WHO, @DrTedros and @LHSTMpress have been around for some time within the twitter network. We could therefore expect these users to have produced a large volume of tweets over time (red circles, Figure 28). On the contrary, new users, such as @paimadhu, @womeninGH and @ughe_org can be observed to be extremely active within a short amount of time, suggesting a vocal presence within the twittersphere (red circles, Figure 28).

From this analysis, we also learn that @LSHTMpress, @LHSTM_alumni, @WHO and @DrTedros are being mentioned frequently (based on node size), which again, is not surprising, given that they are associated with well-established organizations. Nonetheless it is interesting to note that @paimadhu, @womeninGH and @ughe_org are also being regularly mentioned, suggesting that they have acquired a presence within the twitter global health community rapidly despite being new to the twittersphere.

fig_28_border
Figure 28

Finally, we have recently experimented with extracting images that were highly associated within specific communities. This is due to the fact that besides text, images may also contain valuable information about the ideas and opinions prevalent within a community of users. By referencing websites, blogs, images or videos, we may be able to obtain an idea of the broader ecosystem of actors that participate in the debate, including those who may not necessarily be on twitter.

In this visualization, I’ve picked a user community that is pertinent to this talk (Figure 29). This community is centred around @ughe_org and depicts all the twitter users that are frequently associated with it.

fig_29_border
Figure 29

We scraped the top four most liked images from this community, and displayed it in a tile format (Figure 30). This analysis revealed that in the five-month period, the visual most associated with @ughe_org community was associated with the tweet describing the applications to the Masters in Global Health Delivery program at UGHE (Figure 30). Looking at other communities (data not shown here), we observed that the images used were distinctly different according to the discourse prevalent within the specific community (for example a business-minded community versus a community centred around health equity).

fig_30
Figure 30

Closing

From my explorations, I found that mapping knowledge and discourse using data science techniques has so far confirmed existing knowledge, extended my horizons by revealing new insights, supported impressions I had that were previously intangible and generated new leads that could be pursued and developed further such as in the case for biomarker research (Figure 31). Nonetheless, I believe that it is important that domain experts are on board in order to validate findings and identify shortcomings in the data. This is particularly important due to the use of cut-offs during visualization that may result in the loss of information in order to gain clarity. Therefore, in order to use these tools to their fullest, it will be important for domain experts to work together closely with data scientists to make sense and structure the information.

fig_31
Figure 31

Overall, I personally feel that this investigation highlights a greater potential that is still relatively untapped, and which will allow for the understanding of whole healthcare systems thus revealing new directions to effectively deal with the emerging issues of today. I believe that these types of investigations will only become more common and accessible to more people as new sources of data become available together with the creation of tools for their analysis. Targeting such investigations with directed questions will be necessary in order to extract the best possible insights from the information.

It is incredibly exciting to live in an era, whereby it is possible to make sense of complexity with few means and resources. It is likely that in the near future, even a small group of individuals in a resource-poor setting, will be able to be on par with larger organizations with regard to knowledge and intelligence, and subsequently ensure that their voices be heard on the global stage.

Thank you.

Full presentation slides:  http://anivation.org/slides/alyahya_ughe_2018.pdf