-
Mashup Score: 0An AI-powered Public Health Automated Kiosk System for... - 22 day(s) ago
Background: The HERMES Kiosk (Healthcare Enhanced Recommendations through Artificial Intelligence & Expertise System) is designed to provide personalized Over-the-Counter (OTC) medication…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 1The Hallucination Tax of Reinforcement Finetuning - 24 day(s) ago
Reinforcement finetuning (RFT) has become a standard approach for enhancing the reasoning capabilities of large language models (LLMs). However, its impact on model trustworthiness remains…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 1AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions - 24 day(s) ago
For Large Language Models (LLMs) to be reliably deployed in both everyday and high-stakes domains, knowing when not to answer is equally critical as answering correctly. Real-world user queries,…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 8Utility Engineering: Analyzing and Controlling Emergent Value... - 26 day(s) ago
As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the…
Source: arXiv.orgCategories: General Medicine News, Future of MedicineTweet
-
Mashup Score: 8Utility Engineering: Analyzing and Controlling Emergent Value... - 26 day(s) ago
As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the…
Source: arXiv.orgCategories: General Medicine News, Future of MedicineTweet
-
Mashup Score: 4When Two LLMs Debate, Both Think They'll Win - 1 month(s) ago
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 4
Medical vision-language models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 1Reasoning or Overthinking: Evaluating Large Language Models on... - 1 month(s) ago
We investigate the effectiveness of large language models (LLMs), including reasoning-based and non-reasoning models, in performing zero-shot financial sentiment analysis. Using the Financial…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 4Google Scholar is manipulatable - 1 month(s) ago
Citations are widely considered in scientists’ evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and…
Source: arXiv.orgCategories: General Medicine NewsTweet
-
Mashup Score: 0How much do language models memorize? - 1 month(s) ago
We propose a new method for estimating how much a model “knows” about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have…
Source: arXiv.orgCategories: General Medicine NewsTweet
AI-powered HERMES kiosk pilots in public placesโauto-guided OTC drug recommendations making pharmacists optional https://t.co/9UZedXxwBu https://t.co/GwXA7Ijq6A