“Article critique of Utilizing Social Media for Pharmacovigilance: A Review, by Sarker et al. (2015). “;
As technology has advanced and given rise to social media as a platform for the sharing of personal health information, the volume of data amassed by internet websites and applications has prompted the development of innovative methods for online data mining. Some of these methods were reviewed in the article, “Utilizing Social Media for Pharmacovigilance: A Review”, written by Sarker et al. (2015). Through a detailed meta-analysis of existing research, Sarker et al. provided a methodological evaluation of the various ways through which social media could be utilized as a means of contributing to research regarding the occurrence of Adverse Drug Reactions (ADRs). Sarker et al. presented a well-organized meta-analysis that thoroughly explained the various aspects of the explored and research processes. Each of these processes, from selection and abstraction to categorization and review, achieved an impressive level of clarity concerning the associations between successful data extraction methods and effective ADR monitoring, which was the stated purpose of the article. Sarker et al. concluded their review by proposing a systematic framework intended to guide future research in the area of social media pharmacovigilance.
Methods
The methods section of the article, although lengthy and perhaps overwhelming with detail, provided a thorough understanding of the methods that were used. Sarker et al. (2015) explained their use of various databases, paired with inclusion and exclusion criteria, to search existing research related to the topics of social media and pharmacovigilance. By carefully selecting inclusion and exclusion criteria and limiting the selection of research to publications dated from the year 2008 to the year in which the meta-analysis was performed, Sarker et al. filtered through over one thousand search results. The article used a visual image of the application of filters via Medline to provide a demonstration regarding the methodological approach to this process. Although the image likely would not be well-understood by individuals without experience in Medline or research, this article was clearly focused on providing a foundation of information for researchers as exploration of this topic expanded.
The authors acknowledged that social media-related research and data extraction was a relatively new area of exploration; this acknowledgment seemed to aid in the process of filtering search results by ensuring the application of well-thought out selections of specific keywords that focused on research relevance. Sarker et al. (2015) stated that thirty-nine full-text articles were collected, but through the application of specific exclusion criteria, the authors effectively narrowed their results down to twenty-two publications. Interestingly, each of the twenty-two final selections included data extraction methods centered around Natural Language Processing (NLP), which was admitted as an inclusion criteria. It was unclear whether the inclusion of NLP narrowed the results of alternate rule-based approaches; however, it was clear that Sarker et al. focused primarily on NLP.
Sarker et al. (2015) acknowledged that the twenty-two research publications selected for use in the meta-analysis each featured an NLP-based method for collecting and analyzing ADR-related information posted by users on various social media sites. With a sampling of online networks such as Twitter, Facebook, MedHelp, and DailyStrength, the selected publications focused on the aforementioned and similar social media sites relying on NLP methods to extract user provided ADR information (Sarker et al., 2015). Despite such sources offering a plethora of data to be used in pharmacovigilance, there exist inherent limitations regarding the use of social media for such a purpose. Although social media is a derivation through which a significant amount of information can be accessed, there are numerous challenges that render the process of extracting data difficult. From the information in the article, this seemed particularly true when relying on NLP-based methods.
Challenges
NLP-based methods allow for reasonable doubt regarding the authenticity, reliability, and salience of extracted data (Sarker et al., 2015). Additionally, the authors noted that when extracting data from social media sites, there was often the need to consider that many of the individuals providing the information concerning ADRs may have used incorrect terminology, misinformation, and misspellings that would complicate an NLP-based algorithm. Sarker et al. acknowledged this but offered no concrete solutions regarding how to transcend such challenges. Another recognized complication involving NLP extraction techniques, for which the authors again failed to offer solutions, was the recognition that much of the collected information indicated an imbalance regarding the amount of information that was extracted compared to the amount of extracted information that was relevant to ADR monitoring. Such an imbalance gave rise to questions regarding the accuracy and efficiency of NLP techniques; however, such questions were not directly addressed by the authors’ meta-analysis.
Categorization
To explore methods of improving the use of NLP-based pharmacovigilance, Sarker et al. (2015) categorized the twenty-two retrieved publications into four categories: source of data, public availability of data, the presence of annotated data, and evaluation of extraction methods. This process of publication categorization reflected the scrupulous efforts made by the authors to assess both the shortcomings and successes of the various studies in each of the publications. For ease of understanding, the article provided two tables that offered visualization of how the categorization methods were applied.
Interestingly, as a result of the meticulous categorization, Sarker et al. (2015) developed multiple observations that might otherwise have been overlooked. The first observation was that studies performed earlier, toward the year 2008, investigated far fewer ADR-related drugs. A second observation made by Sarker et al. was that earlier studies included sub-domains, such as the illness for which a drug was intended to treat; however, more recent studies placed far less emphasis on sub-domains and focused more on monitoring larger drug ranges. Two other notable observations were that small publications more frequently included annotated data while the larger studies often did not. Additionally, in reviewing the provided table regarding the selected articles’ details, it should be noted that there were significant differences in the sizes of data sets. With a review of only twenty-two articles, it would seem that many of the aforementioned differences would complicate the meta-analysis by introducing confounding variables. However, Sarker et al. noted no such confounders and from observations of their categorization method; they instead concluded that the availability of annotated data made the process of NLP-based data extraction less complicated.
However, even with the presence of annotated data, NLP-based methods involved strategies that could still be regarded as complex or challenging. Among such NLP-based methods that received mention in the article were lexicon-based extraction focused on preformed lists of ADR mentions, association-based extraction used to compile data based on the use of predetermined word pairings, and pattern-based extractions that search for indicative text patterns associated with ADRs (Sarker et al., 2015). It was made clear by the article that despite which NLP method was used for the purpose of extracting data from unsolicited user information found on social media sites and applications, there was no method void of challenges. This conclusion was clearly stated, and despite the various focuses regarding categorization of research and the methods used for the extraction of data, the process of pharmacovigilance relative to ADRs still lacked a uniform approach or evaluative algorithm. To address this, Sarker et al. (2015) stated their belief that the development of such an algorithm would stem from the inclusion of annotated data. Although only fourteen of the articles that were reviewed included annotated data, Sarker et al. emphasized the importance of such data and noted that the inclusion of annotated data in research articles would become an imperative aspect of the development of future machine learning or algorithm data extraction methods. Unfortunately, the article lacked detailed elaboration on this point, which rendered the implied importance of annotated data as a somewhat vague conclusion.
Proposed Framework
Navigating toward a more subjective stance, Sarker et al. (2015) concluded the article with a summative assessment of their meta-analysis. Based on their review of the selected publications, a presentation of framework for the future was provided. A diagram was included for the purpose of furthering support of the proposed framework process; using this relatively simple diagram, the proposed framework for supervised learning was explained. As an obvious first step, the framework of Sarker et al. stated that data collection was the first focus of pharmacovigilance. Sarker et al. assured that the advancements continuously applicable to NLP would aid in addressing many of the aforementioned lexicon, phonetic, and relevance issues in the future; however, this statement came across as a vague assurance lacking more than mere speculation.
After collecting data, the next step was stated to be applying filters. Sarker et al. (2015) briefly mentioned the previously explained challenge that imbalanced data presented; however, they made the claim that future use of public corpora would allow for learning algorithms to receive training that would enhance an algorithm’s accuracy. This was another statement that seemed to extend beyond the plausible assurances that could be made as a result of the meta-analysis. However, Sarker et al. asserted that learning algorithms would be key to more advanced learning techniques that would be capable of both filtering and classifying collected data as a means of providing clearance of information that would be considered irrelevant.
The final process of the proposed framework involved statistical analyses. There were no statistical analyses performed in the meta-analysis by Sarker et al. (2015) as it was clearly stated in the article that regarding the pharmacovigilance of social media, statistical analysis was scarcely explored through research. However, Sarker et al. included this as the final process in their proposed framework with the assurance that as research in this area progressed and further developed, statistical analysis would become an imperative area of focus. As of the date of the meta-analysis, the authors mentioned the numerous limitations that existed concerning the exploration of pharmacovigilance quantitative or qualitative data and applying statistical analyses. Therefore, the assurance of Sarker et al. that statistical analysis would become an area of growth in the future could be viewed as another assurance that falls beyond the scope of the meta-analysis.
Sarker et al. (2015) concluded the article with the presentation of their proposed framework for ADR monitoring via social media sources. It was maintained in the final paragraphs that they believed that the use of social media would continue to grow as technology progressed. Therefore, Sarker et al. claimed that such growth would require a more uniform approach, or framework, such as the one they proposed, to assess the benefits of utilizing social media data for pharmacovigilance. It was made clear in the concluding sentences of the article that the authors believed that the utilization of social media data as a means of monitoring personal accounts of ADRs offered pharmacovigilance an added potential for success.
Conclusion
“Utilizing Social Media Data for Pharmacovigilance: A Review”, by Sarker et al. (2015), offered a comprehensive analysis of existing research regarding the leveraging of social media data for the purpose of gathering information on ADRs. Overall, the authors provided a thorough, easy to follow methodological review of research in this area. The practical significance of this article renders it of great value to other researchers moving forward in the exploration of pharmacovigilance involving social media. Through the authors’ detailed meta-analysis, the reader might be easily persuaded toward agreement with their conclusion that a uniform algorithm was still needed. Relative to the field of health informatics, this approach to pharmacovigilance has the potential to elicit methods of data mining that provide significant contributions to the field of health care through increased surveillance of the occurrence of ADRs. Using the platform of social media to share personal health information, the field and practice of health informatics provides methods, such as those mentioned in the article to explore and expand upon the presented concept of social media-related pharmacovigilance.
Reference
Sarker, A., Ginn, R., Nikfarjam, A., O’Connor, K., Smith, K., Jayaraman, S., … Gonzalez, G. (2015). Utilizing social media data for pharmacovigilance: A review. Journal of Biomedical Informatics, 54, 202–212. http://doi.org/10.1016/j.jbi.2015.02.004