• Mashup Score: 4

    Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language…

    Tweet Tweets with this article
    • When Two LLMs Debate, Both Think They'll Win https://t.co/Q3nGJ0QjtB via @PradyuPrasad et al https://t.co/uy4yiOQY21

  • Mashup Score: 1

    We investigate the effectiveness of large language models (LLMs), including reasoning-based and non-reasoning models, in performing zero-shot financial sentiment analysis. Using the Financial…

    Tweet Tweets with this article
    • Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis https://t.co/SqMuXCkVDZ https://t.co/9ka5PuC5uq

  • Mashup Score: 4

    Citations are widely considered in scientists’ evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and…

    Tweet Tweets with this article
    • Google Scholar is manipulatable https://t.co/8sbfN5isfr via @fengyuanliu et al https://t.co/rG0LYSdAzX

  • Mashup Score: 0

    We propose a new method for estimating how much a model “knows” about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have…

    Tweet Tweets with this article
    • How much do language models memorize? https://t.co/KsFof4rqyN via @csitawarin et al https://t.co/XFiEAaEVHs

  • Mashup Score: 3

    We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities…

    Tweet Tweets with this article
    • GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models https://t.co/vISEekzNEl via @sj_manning et al https://t.co/plnTLoe3bK

  • Mashup Score: 15

    We analyse the migration of 300,000 academic users from Twitter/X to Bluesky between 2023 and early 2025, combining rich bibliometric data, longitudinal social-media activity, and a novel…

    Tweet Tweets with this article
    • Why Academics Are Leaving @X for @bluesky https://t.co/fYSvphTuGx via @dorian_quelle et al https://t.co/gQiFkDXXbc

  • Mashup Score: 20

    Today’s AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would…

    Tweet Tweets with this article
    • RT @SakanaAILabs: Read our paper: “Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents” https://t.co/Ej2KIkEXdh We believ…

  • Mashup Score: 27

    As artificial intelligence (AI) improves, traditional alignment strategies may falter in the face of unpredictable self-improvement, hidden subgoals, and the sheer complexity of intelligent…

    Tweet Tweets with this article
    • RT @bimedotcom: Contemplative Wisdom for Superalignment https://t.co/3Jut97xyh6 ✍️ @RubenLaukkonen @fionn_inglis @shamilch @hohwy @lars_san…