Dr. Avisha Das
Department of Computer Science and Engineering
Educational Qualifications
- 2014-2020 – Ph.D. in Computer Science – University of Houston, Houston, USA
- 2010-2014 – B.Tech Electronics and Communication Engineering – West Bengal University of Technology, India
Dr. Avisha Das joined the Department of Computer Science and Engineering at Shiv Nadar University Chennai in December 2025. She holds a Ph.D. in Computer Science from the University of Houston, USA, and a B.Tech. in Electronics and Communication Engineering from the West Bengal University of Technology, India.
Her work spans natural language processing, biomedical AI, clinical informatics, and AI security. Her research interests lie in Natural Language Processing with a focus on clinical informatics and security analytics. Her current work is geared towards delving into risk analysis and development of privacy-preserving and reliable large language model (LLM)-based knowledge mining frameworks.
Before joining SNU Chennai, she served as a Research Associate at the Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona, and previously worked as a Postdoctoral Research Fellow at the School of Biomedical Informatics, UTHealth Houston.
Work Experience
- Research Associate, Mayo Clinic, Phoenix, AZ (Nov 2023 – Oct 2025)
- Postdoctoral Fellow, University of Texas Health Science Center, Houston, TX (Apr 2021 – Nov 2023)
- Graduate Research Assistant, University of Houston, Houston, TX (Aug 2014 – Dec 2020)
Publications
Recent list of publications: https://scholar.google.com/citations?user=snaeo_oAAAAJ
Journal Papers
- Talati, I., Manuel, J., Das, A., Banerjee, I. and Rubin, D. (2025). Out-of-the-Box Large Lan- guage Models for Detecting and Classifying Critical Findings in Radiology Reports Using Various Prompt American Journal of Roentgenology. [IF: 6.1]
- Das, A., Talati, I., Manuel, J., Rubin, D., and Banerjee, I. (2025). Weakly Supervised Lan- guage Models for Automated Extraction of Critical Findings from Radiology Reports. npj Digital Medicine. [IF: 15.2]
- Li, , Wei, Q., Huang, L.C., Li, J., Hu, Y., Chuang, Y.S., He, J., Das, A., Keloth VK, Yang Y, and Diala CS. (2024). Ensemble pretrained language models to extract biomedical knowledge from literature. Journal of the American Medical Informatics Association (JAMIA) [IF: 4.7].
- Yang, Y., Zuo, X., Das, A., Xu, H., and Zheng, W. Jim (2024). Representation Learning of Biological Concepts: A Systematic Review. Current Bioinformatics [IF: 4].
- Das, A. and Verma, R. (2020). Can Machines Tell Stories? A Comprehensive Compar- ison of Pre-Trained and Fine-Tuned Deep Neural Language Models. IEEE Access [IF: 4].
- El Aassal, A., Baki, S., Das, A., and Verma, R. (2020). An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security IEEE Access [IF: 3.4].
- Das, A., Baki, S., El Aassal, A., Verma, R., and Dunbar, A. (2019). SoK: A Comprehensive Reexamination of Phishing Research from the Security IEEE Communi- cations Surveys & Tutorials [IF: 35.6].
- Karimi, S., Moraes, L., Das, A., Shakery, A., and Verma, R. (2018). Citance-based retrieval and summarization using IR and machine Scientometrics [IF: 3.8].
Conference and Workshop Papers
- Das, A., Diala, CS., Chen, G., Li, Z., Li, R., Anjum, O., and Zheng, W. (2025). Efficient Training Corpus Retrieval for Large Language Model Fine Tuning: A Case Study in 20th World Congress on Medical and Health Informatics (MedINFO).
- Joshi V, Correa, R., Das, A., and Banerjee, I. (2025).Multi-factor debiasing for correlating confounders for ‘fair’ diagnostic model. SPIE Medical Imaging.
- Das, A., Tariq, A., Batalini, F., Dhara, B. and Banerjee, I. (2024). Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer. AMIA Annual
- Das, , Li, Z., Wei, Q., Li, J., Huang, L.C., Hu, Y., Li, R., Zheng, W. and Xu, H. (2023).Extracting Drug-Protein Relation from Literature using Ensembles of Biomedical Transformers. 19th World Congress on Medical and Health Informatics (MedINFO).
- Das, A., Selek, S., Warner, A., Zuo, X., Hu, Y., Keloth, V., Li, J., Zheng, W., and Xu, H. (2022). Conversational Bots for Psychotherapy: A Study of Generative Transformer Models Using Domain-specific Dialogue. ACL Workshop on Biomedical Natural Language Processing Workshop (BioNLP).
- Das, , Li, Z., Wei, Q., Li, J., Huang, L. C., Hu, Y., Li, R., Zheng, W., and Xu, H. (2021). UTHealth@ BioCreativeVII: domain-specific transformer models for drug-protein re- lation extraction. Workshop on BioCreative VII Challenge Evaluation.
- Zeng, V., El Aassal, A., Baki, S., Verma, R., Moraes, L. and Das, A. (2020). Diverse Datasets and a Customizable Benchmarking Framework for Phishing. ACM CODASPY Interna- tional Workshop on Security and Privacy Analytics (IWSPA).
- Das,A , and Verma, R. (2019). Automated email Generation for Targeted Attacks using Natural Language. Language Resources and Evaluation-LREC Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS).
- El Aassal, , Moraes, L., Baki, S., Das, A., and Verma, R. (2018). Anti-Phishing Pilot at ACM IWSPA 2018: Evaluating Performance with New Metrics for Unbalanced Datasets. Conference on Data and Application Security and Privacy (CODASPY) Anti-Phishing Shared Task Pilot.
- Verma, R., and Das, A. (2017, March). What’s in a URL: Fast feature extraction and malicious URL ACM CODASPY International Workshop on Security and Privacy Analytics (IWSPA).
- De Moraes, L. F., Das, A., Karimi, S., and Verma, R. (2018). University of Houston@ CL- SciSumm SIGIR Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL).
- Karimi, S., Moraes, L. F., Das, A., and Verma, R. (2017). University of Houston@ CL- SciSumm 2017: Positional language Models, Structural Correspondence Learning and Textual SIGIR Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL).
Posters and Abstracts
- Tariq, A., Luo, M., Urooj, A., Das, A., Jeong, J., Trivedi, S., Patel, and Banerjee, I. (2024). Domain-specific LLM Development and Evaluation–A Case-study for Prostate Can- cer. AMIA Annual Symposium.
- Das,A , Anjum, O., Chen, G., Zheng, W., and Li, Rongbin (2024). Efficient Training Corpus Retrieval for Large Language Model Fine Tuning AMIA Informatics Summit.
- Das, A., Anjum, O., Zheng, W., and Diala, C. (2023). A Multi-faceted Mining Tool for Knowledge and Data Discovery for Cancer International Conference on Intel- ligent Biology and Medicine (ICIBM).
- Das, A. (2019) AskAna: Retrieval Based Virtual Assistant for Digital Operations and Field Development. Rice Data Science Conference.
- Das, A, and Verma, R. (2017). What’s in a URL: Fast Feature Extraction and Detection of Malicious URLs. Women in CyberSecurity (WiCyS) Conference.
- Das, A, and Verma, R. (2016). Analyzing Phishing URLs. Poster at Grace Hopper Confer- ence for Celebration of Women.
- Das, A, and Verma, R. (2016). Are Legit and Phishing URLs similar? Hell No! – Lexical characterization and Analysis of URLs. Women in CyberSecurity (WiCyS) Conference.
- Das, A., and Verma, R. (2016). Studying Phishing URLs the NLP way. Computing Research Association (CRA-W) Grad Cohort Workshop
Book Chapters
- Tariq, A., Luo, M., Urooj, A., Das, A., Jeong, J., Trivedi, S., Abdul-Muhsin, H., Ghaffar, U., Yu, N., Patel, B., and Banerjee, I. (2024). Development Of LLM For Prostate Cancer – The Need for Domain-Tailored Cancer Detection and Diagnosis (pp. 397-406). CRC Press.
Preprints/Under Review
- Talati, I., Das, A., Manuel, J., Rubin, D., and Banerjee, I. (2025). Detection and Classifica- tion of Critical Findings in Radiology Reports Using Large Language Models . Under Review at Lancet Digital
- Tariq, A., Luo, M., Urooj, A., Das, A., Jeong, J., Trivedi, S., Patel, B. and Banerjee, I. (2024). Domain-specific LLM Development and Evaluation–A Case-study for Prostate Can- cer. medRxiv preprint.
- Das, A., Tariq, A., Batalini, F., Dhara, B., and Banerjee, I. (2024). Framework for Exposing Vulnerabilities of Clinical Large Language Model: A Case Study in Breast Cancer. Under Review at npj Precision Oncology
- Das, A., Anjum, O., Chen, G., and Zheng, W. Jim (2023). Efficient Training Corpus Re- trieval for Large Language Model Fine Tuning. Under Review
- Das, , Jin, K., Keloth, V., Selek, S., and Xu, H. (2023). A Methodological Review of Deep Learning-based Virtual Assistants for Healthcare. Under Review.
- Das, and Verma, R. (2020). Modeling Coherency in Generated Emails by Leveraging Deep Neural Learners. ArXiv preprint.
Grants Awarded
- Cancer Prevention and Research Institute of Texas (CPRIT)-McWilliams School of Biomedical Informatics at UTHealth Houston, Genomics and Translational Cancer Research Training Pro- gram (BIG-TCR) Postdoctoral Trainee Grant. Title: “Building an Automated Tool for Knowledge and Data Discovery for Cancer Research: A Multi-Faceted Approach by Biomedical Literature Mining.” 2022-2024.
Area of Research
- Natural Language Processing (NLP)
- Natural Language Understanding and Generation (LLMs, VLMs, MLMs)
- Data Mining & Knowledge Retrieval
- Trustworthy & Secure AI (Adversarial NLP, Privacy, and Safety of LLMs)
- Healthcare & Clinical Informatics
Awards
- CPRIT BIG-TCR Postdoctoral Training Program Fellowship, 2022-2024. Cancer Prevention and Research Institute of Texas, UTHealth Houston.
- Second place, Litcoin NLP Challenge, March 2022. National Center for Advancing Translational Sciences (NCAT), UTHealth Houston.
- Cullen Graduate Success Fellowship, Fall UH Alumni Association, University of Houston.
- Govt. of India Merit-based Scholarship for Undergraduate Education, 2010 -2014. Ministry of Human Resources-India (MHRD), India.
Links
Website: https://dasavisha.github.io/
Google Scholar: https://scholar.google.com/citations?user=snaeo_oAAAAJ