Telegram Web Link
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

10 Mar 2025 · Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang

Enhancing reasoning in Large Multimodal Models (#LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose \textbf{\method}, a two-stage framework adapting rule-based RL for multimodal reasoning through \textbf{Foundational Reasoning Enhancement (FRE)} followed by \textbf{Multimodal Generalization Training (MGT)}. The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2.5-VL-Instruct-3B demonstrate that \method achieves 4.83\% and 4.5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.

Paper: https://arxiv.org/pdf/2503.07536v1.pdf

code: https://github.com/tidedra/lmm-r1

@Machine_learn
📃 Biological Multi-Layer and Single Cell Network-Based Multiomics Models - a Review

📎 Study the paper


@Machine_learn
مقاله ی طبقه بندی زخم چند وجهی که در یکی از بهترین ژورنال های Elsevier به چاپ رسوندیم.

Multi-modal wound classification using wound image and location by Swin Transformer and Transformer

Accepted

Author: Ramin Mousa,
Behnaz Rezaei, Laya Mahmoudi, Jafar Abdollahi

If: 7.5

Journal: https://www.sciencedirect.com/journal/expert-systems-with-applications


Paper: Link

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration!

- 5x🚀than vanilla (on HF)
- 1.4x🚀than EAGLE-2 (on HF)
- A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang)
- 1.65x🚀in latency even for large bs=64 (on SGLang)
- A new scaling law: more training data, better speedup
- Apache 2.0

Paper: https://arxiv.org/abs/2503.01840
Code: https://github.com/SafeAILab/EAGLE
SGLang version: https://github.com/sgl-project/sglang/pull/4247

@Machine_learn
Executable Code Actions Elicit Better LLM Agents

1 Feb 2024 · Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating #JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source #LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with #Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

Paper: https://arxiv.org/pdf/2402.01030v4.pdf

Codes:
https://github.com/epfllm/megatron-llm
https://github.com/xingyaoww/code-act

Datasets: MMLU - GSM8K - HumanEval - MATH

@Machine_learn
PiEEG kit - bioscience Lab in home for your Brain and Body

🖥 Github: https://github.com/pieeg-club/PiEEG_Kit

📕 Paper: https://arxiv.org/abs/2503.13482

🌟 Methods: https://paperswithcode.com/task/eeg-1
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Introduction to Graph Neural Networks: A Starting Point for Machine Learning Engineers

📓 Paper


@Machine_learn
Forwarded from Papers
با عرض سلام
در ادامه ی کار تحقیقاتی یک مقاله مروری در حوزه پاتولوژی رو می خواهیم بنویسیم. دوستانی که مایل هستن نفرات ۲ تا ۵ این موضوع رو می تونن شرکت کنن.

زمان تقریبی شروع ۲۰ فروردین.

Journal: scientific reports https://www.nature.com/srep/

Price:
2: 400$
3: 300$
4: 200$
5: 150$
توضیحات کامل و نحوه نگارش هر بخش رو خودم کمک میکنم.

@Raminmousa
@Machine_learn
@Paper4money
Please open Telegram to view this post
VIEW IN TELEGRAM
Forwarded from Papers
با عرض سلام
در ادامه ی کار تحقیقاتی یک مقاله مروری در حوزه پاتولوژی رو می خواهیم بنویسیم. دوستانی که مایل هستن نفرات ۲ و ٣ این موضوع رو می تونن شرکت کنن.

زمان تقریبی شروع ۲۰ فروردین.

Journal: scientific reports https://www.nature.com/srep/

Price:
2: 400$
3: 300$

توضیحات کامل و نحوه نگارش هر بخش رو خودم کمک میکنم.

@Raminmousa
@Machine_learn
@Paper4money
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
Graph Theory and Additive Combinatorics
Exploring Structure and Randomness

📚 link


@Machine_learn
🔥 Transformers Laid Out

📌 Guide


@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Bias-Variance Trade-Off in Statistics at MIT OpenCourseWare

📚 Book



@Machine_learn
2025/07/07 17:30:15
Back to Top
HTML Embed Code: