Catch up on the latest AI articles

Health-LLM: The Potential Of Large-Scale Linguistic Models To Change The Future Of Health Care

Health-LLM: The Potential Of Large-Scale Linguistic Models To Change The Future Of Health Care

Large Language Models

3 main points
✔️ Proposal of a new framework, Health-LLM: Expanding the potential of large-scale language models in healthcare by leveraging multimodal data from wearable sensors.
✔️ Improving performance by devising prompts and fine-tuning: confirming a significant improvement in performance on health prediction tasks.

✔️ Ethical issues and future research directions: need to resolve ethical issues such as privacy protection, bias elimination, and accountability.

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
written by Yubin KimXuhai XuDaniel McDuffCynthia BreazealHae Won Park
(Submitted on 12 Jan 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)


The images used in this article are from the paper, the introductory slides, or were created based on them.


In recent years, large-scale language models (LLMs) have shown tremendous potential for applications ranging from text generation to knowledge retrieval. In particular, many studies, such as OpenAI and the work by Singhal et al. However, the potential of large-scale language models in the sensitive field of healthcare has been little explored, especially in the context of leveraging diverse multimodal data from wearable sensors. The complexity and time-series nature of this type of data is a challenging topic for large-scale language models.

In this paper, we propose a new framework called "Health-LLM" to fill this gap. It aims to improve the capabilities of large-scale language models specific to the healthcare domain. We evaluate state-of-the-art large-scale language models on six publicly available health datasets and validate their performance through 13 health prediction tasks.

In the process, four comprehensive attempts have also been made: zero-shot prompting, few-shot prompting along with chain-of-thoughts (CoT) and self-consistency (SC) prompting, instructional fine-tuning, and ablation studies with contextual enhancement. self-consistency (SC) prompting), instructional fine-tuning, and ablation research with contextual enhancement.

Experimental results show that large-scale language models perform well on tasks in the healthcare domain, especially in fuchsot prompting and instructional fine tuning. In particular, the Health-Alpaca model achieves remarkable results on several tasks, albeit on a smaller scale. We also show that contextual enhancement contributes to the performance improvement of large-scale language models.

This study provides strategies for the potential use and implementation of large-scale language models in healthcare. It provides important insights into how large-scale language models can enable more sophisticated health prediction and analysis, and how this can be applied to clinical practice and health care.


The first is zero-shot prompting. The purpose of zero-shot prompting is to assess the ability of pre-trained knowledge of large-scale language models in health prediction tasks. To achieve this objective, we first designed a basic prompt setting (bs), which is a paragraph-like summary of wearable sensor data.

  1. User context (uc) provides user-specific information such as age, gender, weight, and height, and provides additional information that influences the understanding of health knowledge.
  2. Health context (hc) provides definitions and equations that control specific health goals and infuse new health knowledge into large-scale language models.
  3. Time context (tc) is employed to test the importance of the temporal aspects of time series data. Instead of using aggregated statistics, raw time series sequences are used. Among the different sets of time context representations, it has been empirically observed that the use of natural language strings performed best.
  4. All (all) is the case where all contexts are combined in the prompt.

The second is fuchot prompting. Few-shot prompting involves the use of limited demonstration examples within a prompt to facilitate in-context learning. This paper employs a three-shot setup. These demonstrations are used only within the prompt and the model parameters remain static.

This approach is similar to providing a small number of case studies to the model, thereby helping to effectively capture and apply knowledge in the healthcare domain. In addition to Fuchot prompting, we integrate Chain-of-Thoughts (CoT) and Self-Consistency (SC) prompting techniques.

The introduction of CoT prompting facilitated a more coherent and contextually nuanced understanding, allowing the model to connect ideas seamlessly. At the same time, SC prompting has helped refine the model's response by promoting internal coherence and logical consistency.

The third is instruction tuning. Instruction tuning is a technique in which all parameters of a pre-trained model are further learned or fine-tuned on a target task. This process allows the model to adapt its pre-trained knowledge to the specificity of the new task and optimize its performance. In the context of health prediction, fine tuning allows models to gain a deeper understanding of physiological terms, mechanisms, and contexts, thus enhancing their ability to generate accurate and contextually relevant responses.

The fourth is parameter efficient fine tuning (PEFT). Instead of fine tuning all parameters, methods such as LoRA train a small amount of parameters by injecting a trainable low-rank matrix into each layer of a pre-trained model In the context of Health-LLM, these PEFT techniques allow models to adapt to healthcare tasks while maintaining computational efficiency.


The table below shows the results of a comprehensive performance evaluation of a large-scale language model in a health prediction task.

STRS: stress, READ: readiness, FATG: fatigue, SQ: sleep quality, SR: stress tolerance, SD: sleep disturbance, ANX: anxiety, DEP: depression, ACT: activity, CAL: calories; A_FIB: atrial fibrillation, SINUS_B: sinus bradycardia, SINUS_T: sinus tachycardia. Also, "-" represents cases that failed due to token size limitation or irrational response. N/A" represents cases where no prediction was reported or cannot be performed.

In each column (task), the best result is shown in bold and the second best result is underlined; CoT indicates a chain of thought and SC indicates a self-integrity prompting method. In each task, arrows in parentheses indicate the desired direction of improvement. ↑ indicates a higher value is better for precision, and ↓ indicates a lower value is better for mean absolute error.


This paper extensively evaluates the potential of large-scale language models (LLMs) in consumer health prediction tasks and provides new insights from the results. Several large-scale language models, including the latest model, Health-Alpaca, compete in performance on 13 different health prediction tasks, showing that prompting and model fine-tuning, in particular, contribute to improved performance.

However, there remain important ethical issues to be resolved in the application of this technology, such as protecting privacy, eliminating bias, and ensuring accountability (XAI). It is emphasized that further research is needed to address these challenges before the technology can be implemented in actual medical practice.

Other limitations of the current study include the quality of the data set used and the lack of a detailed evaluation of the inferential capabilities of the model. Future research is expected to develop specific methodologies to address these issues.

Ultimately, this research highlights the enormous potential of health prediction using large-scale language models and the challenges that must be overcome to translate them into real-world medical applications. The proposed directions, including the adoption of privacy-preserving techniques and improved accuracy of model inference, are a step toward providing reliable medical services.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!
Takumu avatar
I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us