跳转至

AI Learning Roadmap | AI学习路线图

Future Overview | 趋势概览

wendellhua/AI-Learning-Roadmap

Future Trends Overview | 未来趋势概览¶

AI is moving beyond text. The future is Multimodal, Generative Video, and Embodied.

AI 正在超越文本。未来属于多模态、视频生成和具身智能。

1. Multimodal AI (LMMs) | 多模态 AI¶

Large Multimodal Models (LMMs) can process and generate text, images, audio, and video simultaneously. 大型多模态模型（LMM）可以同时处理和生成文本、图像、音频和视频。

Current State: GPT-4o, Gemini 1.5 Pro.
Trend: "Any-to-Any". (Speech to Code, Video to Music).
趋势：“任意对任意”。（语音转代码，视频转音乐）。
Impact: Seamless human-computer interaction. No need to type; just show and speak.
影响：无缝人机交互。无需打字，只需展示和说话。

2. Video Generation | 视频生成¶

From static images to dynamic worlds. 从静态图像到动态世界。

Key Players: Sora (OpenAI), Kling (Kuaishou), Runway Gen-3, Luma Dream Machine.
主要玩家：Sora, 可灵, Runway, Luma。
Physics Simulation: Models are learning "World Models" — understanding gravity, collision, and object permanence.
物理模拟：模型正在学习“世界模型”——理解重力、碰撞和物体恒常性。
Challenge: Consistency (flickering) and Controllability.
挑战：一致性（闪烁）和可控性。

3. Embodied AI (Robotics) | 具身智能¶

Putting the "Brain" (LLM) into a "Body" (Robot). See the Deep Dive for more details. 将“大脑”（LLM）放入“身体”（机器人）。更多详情请参阅深度解析。

Concept: Traditional robots are hard-coded. Embodied AI uses VLM to "see" the environment and plan actions.
概念：传统机器人是硬编码的。具身智能利用 VLM“看”环境并规划动作。
Example: Figure 02 (OpenAI), Tesla Optimus Gen 2, Unitree G1.
示例：Figure 02 (OpenAI), 特斯拉 Optimus Gen 2, 宇树 G1。
Future: General Purpose Humanoid Robots in homes and factories.
未来：家庭和工厂中的通用人形机器人。

4. Small Language Models (SLMs) | 小语言模型¶

Not everyone needs 1T parameters. 不是每个人都需要 1T 参数。

Trend: High performance on edge devices (Phones, Laptops).
趋势：边缘设备（手机、笔记本）上的高性能。
Examples: Phi-3 (Microsoft), Gemma (Google), Qwen-1.5B.
示例：Phi-3, Gemma, Qwen-1.5B。
Goal: Privacy, low latency, offline capability.
目标：隐私、低延迟、离线能力。