Research Highlights

Biomedical Text Mining

“The way to find a needle in a haystack is to sit down.” ― Beryl Markham, West with the Night

Biomedical text mining spans a gamut of goals from information extraction (say of protein interactions) to hypothesis discovery (suggesting new ideas for discovery). In early work on literature based hypothesis discovery we built a prototype system Manjal and used it to replicate all or close to all of the hypotheses proposed by the pioneering team of Swanson and Smalheiser (see JASIST, 2004 below). Since then we have worked on biomedical text mining problems such as gene document retrieval, drifts in annotation etc. In current collaborative work (with the University of Maryland and St. Bonaventure University) we are mining the annotated biological web for information related to genes. Our role is to find sentence imprints for genes and their annotations of interest. This is a collaborative project with NSF funding.

  • ** Anupindi, T.R., and Srinivasan, P. Disease Comorbidity Links between MEDLINE and Patient Data. IEEE International Conference on Healthcare Informatics (ICHI 2017). Short paper, Park City, Utah, August 23 - 27, 2017
  • ** Srinivasan, P., Zhang, X-N, Bouten, R., Chang, C. Ferret: A sentence-based literature scanning system. BMC Bioinformatics 2015, 16:198. (Highly Accessed).
  • ** Yang C., Bhattacharya S. and Srinivasan, P. The University of Iowa at CLEF 2014: eHealth Task 3. CLEF Labs Paper.
  • ** Sehgal, A. K., Qiu, X. Y., Srinivasan, P. Analyzing LBD Methods using a General Framework. Bruza, P. and Weeber, Marc (eds) Literature-Based Discovery. Springer's series on ``Information Science and Knowledge Management'', Vol. 15, 75-100, 2008.
  • ** Srinivasan, P. Text Mining: Generating Hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology. 55(5), 396-413, March 2004.

Web Mining: Politics & Elections

“There comes a time when one must take a position that is neither safe, nor politic, nor popular, but he must take it because conscience tells him it is right.” ― Martin Luther King Jr., A Testament of Hope: The Essential Writings and Speeches

We have worked on several aspects regarding politics and elections, starting with early on sentiment analysis with data from blogs, Twitter and YouTube comments. More recent research is on mining public perceptions on personality, estimating political slant and analysing data realted primaries and elections. Work with political sentiment continues.

  1. ** Boynton, G. R., Le, H.T., Mejova, Y., Shafiq, M. and Srinivasan, P. What Campaigns Become as Social Media Become the Infrastructure of Political Communication. In: Glenn W. Richardson (Eds.) Social Media and Politics, Volume 1, 21 pages, 2017.
  2. ** Le, H., Boynton, B., Mejova, Y., Shafiq, Z., and Srinivasan, P. Bumps and Bruises: Mining Presidential Campaign Announcements on Twitter. Hypertext, Prague, Czech Republic, 2017.
  3. ** Le, H., Boynton, B., Mejova, Y., Shafiq, Z., and Srinivasan, P. Revisiting the American Voter. Proceedings of ACM SIG CHI (Conference on Human Factors in Computing Systems), pages 4507-4519 ACM May 2017, Colorado, USA.
  4. ** Bhattacharya, S., Yang, C., Srinivasan, P., Boynton, B. Perceptions of Presidential Candidates' Personalities in Twitter. Journal of the Association for Information Science and Technology. 67(2): 249 - 267, 2016
  5. ** Mejova, Y., Srinivasan, P. Boynton, B. GOP Primary Season on Twitter: "Popular" Political Sentiment in Social Media. Proc. of the Sixth International Conference on Web Search and Data Mining (WSDM), Rome, Italy. February 2013
  6. ** Mejova, Y., and Srinivasan, P. Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter, Sixth International AAAI Conference on Weblogs and Social Media (ICWSM). Dublin, Ireland, June 2012.

Adverse Effects

For it is a good remedy sometimes to apply nothing at all ~ Padmini, on a Friday afternoon.

Mining new and as yet unknow effects of drugs and specficially new adverse effects is a major goal in biomedical text mining. There is active research on mining effects from biomedical publications and patient records. We view the web as a parallel though noisy source of drug effects data. As individuals take medications they offer their opinions on social media. Mining for such discussions may offer new insights. This is also of use in the broader area of drug repurposing research. As part of our belief mining research we have found for example discussions on 'lyrica causes hair loss' [1]. Of course, it is important to follow up by judging the scientific novelty and validity of these mined ideas. A second direction we are pursuing is the extraction of adverse effects from Letters to the Editors in journals such as JAMA [2]. These offer some of the early evidence of such reports and well complement the FDAs AERS system.

Web Mining: Subjective Well-Being

“Happiness is determined more by one’s state of mind than by external events.” ― Dalai Lama XIV, The Art of Happiness: A Handbook for Living

In the last few years we developed a methodological framework (called S-to-S) for translating surveys to social media surveillance strategies (passive text analytics). The proof of concept has been demonstrated in the area of 'Life Satisfaction'. We have used the S-to-S framework to develop algorithms for finding users who are satisfied/dissatisfied with their lives.

This line of research continues in collaboration with Professor Louis Tay of Purdue University (Psychological Sciences).

  1. ** Yang, C., Srinivasan, P. From Surveys to Surveillance on Social Media: Methodological Challenges & Solutions. WebSci '14 Proceedings of the 2014 ACM conference on Web Science (WebSci), Pages 4-12, 2014.
  2. ** Yang C., Srinivasan P. Life Satisfaction and the Pursuit of Happiness on Twitter. PLoS One. 11(3):e0150881, 2016. DOI: 10.1371/journal.pone.0150881.

Web Mining

“Too many things on my mind, said Wilbur. Well, said the goose, that's not my trouble. I have nothing at all on my mind, but I've too many things under my behind.” ― E.B. White, Charlotte's Web

Web mining is a broad area of research which in essence includes almost any kind of data analytics using web data. In early years we worked intensely on the design and evaluation of web crawlers. More recent interests are represented by the following papers.

  1. ** Bhattacharya, S., Srinivasan, P.,and Polgreen, P. Social Media Engagement Analysis of U.S. Federal Health Agencies on Facebook. BMC Medical Information Decision Making 17(49)}, 2017
  2. ** Shahid, U., Farooqi, S., Ahmad, R., Shafiq, Z., Srinivasan, P. and Zaffar, F. Accurate Detection of Automatically Spun Content via Stylometric Analysis. IEEE International Conference on Data Mining (ICDM) New Orleans, November 18 - 21, 2017.
  3. ** Le, H., Shafiq, Z., and Srinivasan, P. Scalable News Slant Measurement Using Twitter. The 11th International AAAI Conference on Web and Social Media (ICWSM), 2017. (Poster).
  4. ** Bhattacharya, S., Srinivasan, P.,and Polgreen, P. Engagement of Health Agencies on Twitter. PLoS One 2014 Nov 7; 9(11)}:e112235. doi: 10.1371//journal.pone.0112235.
  5. ** Yang, C., Pan, S., Mahmud, J.U., Yang, H. and Srinivasan, P. Using Personal Traits For Brand Preference Prediction. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), pages 86-96, September 17-21, 2015, Lisbon, Portugal.

Besides addressing problems of general interest we have also been studying web mining in the following contexts.

Web Mining: Beliefs

"Man is a credulous animal, and must believe something; in the absence of good grounds for belief, he will be satisfied with bad ones". ~Bertrand Russell (Unpopular Essays)

Discussions on ill health and wellness form a significant portion of content on social media. Our goal is to mine these for beliefs in health related notions using an approach that we call 'belief surveillance'. As a first step we focus on Twitter. The big challenge is to find the correct tweets and then to correctly classify them as supporting, refuting or simply questioning a health related notion. We take a micro-perspective by assessing specific beliefs such as 'vaccines cause autism' and 'lemon treats cancer'. With a 2012 subset of the data we have found, for example, high levels of belief in both false and debatable propositions.

  • ** Bhattacharya, S., Tran, H., and Srinivasan, P. Discovering Health Beliefs in Twitter. AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text. Washington, DC., November 2012.
  • ** Bhattacharya, S., and Srinivasan, P. A Semantic Approach to Involve Twitter in LBD Efforts. Proc. of the First International Workshop on the Role of Semantic Web in Literature-Based Discovery (SWLBD 2012), The IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, USA. October 2012.

Privacy, Security and Censorship

"Like the camouflaged nightjar, to stay hidden in plain sight is the goal!". ~Padmini (on a summer day)

Recent news such as about Cambridge Analytica and risks faced by whistleblowers underline the need for safe spaces for public communication. It is of no surprise that algorithms for privcay protection and protection against censorship are key areas. Recent papers include:

  • Mahmood, A., Ahmed, F., Shafiq, Z., Srinivasan, P., and Zaffar, F. A girl has no name: Automated authorship obfuscation with Mutant-X. PETS, 2019.
  • Rusert, J., Khalid, O., Hong, Dat., Shafiq, Z., and Srinivasan, P. No place to hide: Inadvertent location privacy leaks on Twitter. PETS, 2019.
  • Crowdsourcing & GWAPs (Games With A Purpose)

    “We do not stop playing because we grow old, we grow old because we stop playing!” ― Benjamin Franklin

    Crowdsourcing and games with a purpose are two key recent developments offering new ways to incorporate human intelligence into algorithmic methods. In crowdsourcing one may employ workers from a large, anonymous crowd of individuals usually at rather cheap pay. This mode is ideally suited for simple, repetitive tasks such as annotating collections for entities, classifying images etc. GWAP is logically similar to crowdsourcing with the added aspect that the task is run within a game designed with a competitive edge. The idea is to attract participants by the entertainment aspects of the game while simultaneously achieving task goals. We are studying different applications, for example, in obtaining relevance judgments, information extraction etc.

    • ** Harris, C., Srinivasan, P. Hybrid Crowd-Machine Methods as Alternatives to Pooling & Expert Judgments. Asia Information Retrieval Society Conference 2014 (AIRS 2014).
    • ** Eickhoff, C., Harris, C., de Vries, A.P., Srinivasan, P. Quality through Flow and Immersion: Gamifying Crowdsourced Relevance Assessments. Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR 2012), pages 871-880. Portland, Oregon, August 2012.
    • ** Harris, C., Srinivasan, P. Applying Human Computation Mechanisms to Information Retrieval. ASIST Annual Conference. Maryland, October 2012.


        1. Bob Boynton
        2. Phil Polgreen
        3. Zubair Shafiq
        4. Louiqa Rashchid, University of Maryland
        5. Louis Tay, Purdue University
        6. Xiao-Ning Zhang, St. Bonaventure University
        7. Muhammad F. Zaffar, Lahore University of Management Sciences
        8. Arjen P. de Vries, University of Nijmegen

      1. Students
        1. Ingroj Shrestha
        2. Asad Mahmood
        3. Jonathan Rusert
        4. Momina Syeda Tabish
        5. Osama Khalid
        6. Dat Hong
        7. Sanmitra Bhattacharya
        8. Yelena Mejova
        9. Chao Yang
        10. Chris Harris
        11. C. Eickoff
        12. Huyen Le
        13. Usman Shahid
        14. Faizan Ahmed
        15. Shehroze Farooqi
        16. Reza Ahmed
        17. Tejaswi Rohit Anupindi