arXiv.org – MashupMD

Mashup Score: 0

An AI-powered Public Health Automated Kiosk System for... - 22 day(s) ago

Background: The HERMES Kiosk (Healthcare Enhanced Recommendations through Artificial Intelligence & Expertise System) is designed to provide personalized Over-the-Counter (OTC) medication…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- jselanikio
  
  AI-powered HERMES kiosk pilots in public places—auto-guided OTC drug recommendations making pharmacists optional https://t.co/9UZedXxwBu https://t.co/GwXA7Ijq6A

Mashup Score: 1

The Hallucination Tax of Reinforcement Finetuning - 24 day(s) ago

Reinforcement finetuning (RFT) has become a standard approach for enhancing the reasoning capabilities of large language models (LLMs). However, its impact on model trustworthiness remains…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  The Hallucination Tax of Reinforcement Finetuning https://t.co/e339UnqMQ7 via @linxins2 et al https://t.co/Hpo9DbUs5w

Mashup Score: 1

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions - 24 day(s) ago

For Large Language Models (LLMs) to be reliably deployed in both everyday and high-stakes domains, knowing when not to answer is equally critical as answering correctly. Real-world user queries,…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions https://t.co/DeBTNqJi2M via @polkirichenko et al https://t.co/JFYAxarD9B

Mashup Score: 8

Utility Engineering: Analyzing and Controlling Emergent Value... - 26 day(s) ago

As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the…

Source: arXiv.org

Categories: General Medicine News, Future of Medicine

Tweet Tweets with this article
- JohnNosta
  
  Utility Engineering: Analyzing and Controlling Emergent VALUE Systems in AIs https://t.co/lMoZa0uRCH

Mashup Score: 8

Utility Engineering: Analyzing and Controlling Emergent Value... - 26 day(s) ago

As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the…

Source: arXiv.org

Categories: General Medicine News, Future of Medicine

Tweet Tweets with this article
- JohnNosta
  
  Utility Engineering: Analyzing and Controlling Emergent VALUE Systems in AIs https://t.co/lMoZa0uRCH

Mashup Score: 4

When Two LLMs Debate, Both Think They'll Win - 1 month(s) ago

Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  When Two LLMs Debate, Both Think They'll Win https://t.co/Q3nGJ0QjtB via @PradyuPrasad et al https://t.co/uy4yiOQY21

Mashup Score: 4

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray... - 1 month(s) ago

Medical vision-language models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pranavrajpurkar
  
  🔗 Paper: https://t.co/xGqtgwxVBk 🔗 CVPR: https://t.co/7IwdgIkPMY 🔗 Youtube: https://t.co/j95FpTgRkW 🔗 GitHub: https://t.co/h1UmSLj9c7

Mashup Score: 1

Reasoning or Overthinking: Evaluating Large Language Models on... - 1 month(s) ago

We investigate the effectiveness of large language models (LLMs), including reasoning-based and non-reasoning models, in performing zero-shot financial sentiment analysis. Using the Financial…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis https://t.co/SqMuXCkVDZ https://t.co/9ka5PuC5uq

Mashup Score: 4

Google Scholar is manipulatable - 1 month(s) ago

Citations are widely considered in scientists’ evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  Google Scholar is manipulatable https://t.co/8sbfN5isfr via @fengyuanliu et al https://t.co/rG0LYSdAzX

Mashup Score: 0

How much do language models memorize? - 1 month(s) ago

We propose a new method for estimating how much a model “knows” about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have…

Source: arXiv.org

Categories: General Medicine News

Tweet Tweets with this article
- pash22
  
  How much do language models memorize? https://t.co/KsFof4rqyN via @csitawarin et al https://t.co/XFiEAaEVHs