• Mashup Score: 0

    Background: The HERMES Kiosk (Healthcare Enhanced Recommendations through Artificial Intelligence & Expertise System) is designed to provide personalized Over-the-Counter (OTC) medication…

    Tweet Tweets with this article
    • AI-powered HERMES kiosk pilots in public placesโ€”auto-guided OTC drug recommendations making pharmacists optional https://t.co/9UZedXxwBu https://t.co/GwXA7Ijq6A

  • Mashup Score: 1

    Reinforcement finetuning (RFT) has become a standard approach for enhancing the reasoning capabilities of large language models (LLMs). However, its impact on model trustworthiness remains…

    Tweet Tweets with this article
    • The Hallucination Tax of Reinforcement Finetuning https://t.co/e339UnqMQ7 via @linxins2 et al https://t.co/Hpo9DbUs5w

  • Mashup Score: 1

    For Large Language Models (LLMs) to be reliably deployed in both everyday and high-stakes domains, knowing when not to answer is equally critical as answering correctly. Real-world user queries,…

    Tweet Tweets with this article
    • AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions https://t.co/DeBTNqJi2M via @polkirichenko et al https://t.co/JFYAxarD9B

  • Mashup Score: 4

    Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language…

    Tweet Tweets with this article
    • When Two LLMs Debate, Both Think They'll Win https://t.co/Q3nGJ0QjtB via @PradyuPrasad et al https://t.co/uy4yiOQY21

  • Mashup Score: 1

    We investigate the effectiveness of large language models (LLMs), including reasoning-based and non-reasoning models, in performing zero-shot financial sentiment analysis. Using the Financial…

    Tweet Tweets with this article
    • Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis https://t.co/SqMuXCkVDZ https://t.co/9ka5PuC5uq

  • Mashup Score: 4

    Citations are widely considered in scientists’ evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and…

    Tweet Tweets with this article
    • Google Scholar is manipulatable https://t.co/8sbfN5isfr via @fengyuanliu et al https://t.co/rG0LYSdAzX

  • Mashup Score: 0

    We propose a new method for estimating how much a model “knows” about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have…

    Tweet Tweets with this article
    • How much do language models memorize? https://t.co/KsFof4rqyN via @csitawarin et al https://t.co/XFiEAaEVHs