# vcli

**Repository Path**: lvhaodeyeye/vcli

## Basic Information

- **Project Name**: vcli
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-30
- **Last Updated**: 2026-03-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# vcli — Multimodal AI CLI

A command-line tool that wraps the **Qwen3.5-Omni-Plus** multimodal model (Alibaba Bailian / DashScope) via its OpenAI-compatible API. Built with an extensible provider design to support Gemini and other models in the future.

## Model: Qwen3.5-Omni-Plus

| 能力 | 规格 |
|------|------|
| 视频 | ≤400s 720p (1fps)，含音视频理解 |
| 音频 | >10小时长音频，60+语言输入，30+语言语音输出 |
| 图片 | 多图理解 |
| 文本 | 结构化理解，创作，问答 |

## Features

- Ask questions with optional video, image, or audio input
- **Audio auto-conversion**: DashScope 不支持 `input_audio`，自动将音频转为黑屏视频走 `video_url` 通道
- **Auto-compression**: 超过 4MB 的文件自动压缩（降分辨率/码率），适配 API 6MB body 限制
- Streaming responses (required by the Qwen API)
- Local file upload via base64 encoding or remote URL support
- Rotating file log at `~/.video_cli/logs/vcli.log`
- Config stored at `~/.video_cli/config.yaml` (auto-created on first run)
- Gemini provider stub (ready for future implementation)

## Tech Stack

- **Python 3.11+** with [uv](https://github.com/astral-sh/uv) for environment management
- **typer** — CLI framework
- **openai** SDK — OpenAI-compatible HTTP client
- **pyyaml** — config file serialization
- **httpx** — HTTP transport
- **ffmpeg** — audio-to-video conversion and video compression

## Setup

```bash
# 1. Create virtual environment
uv venv

# 2. Install in editable mode
uv pip install -e .

# 3. Set your DashScope API key
.venv/bin/vcli config set api_key <YOUR_DASHSCOPE_KEY>
```

Or activate the venv first:

```bash
source .venv/bin/activate
vcli config set api_key <YOUR_DASHSCOPE_KEY>
```

### Dependencies

- **ffmpeg** — required for audio processing and video compression

## Usage

### Ask a text question

```bash
vcli ask "What is the capital of France?"
```

### Ask about a video (local file)

```bash
vcli ask "Describe what happens in this video" --video /path/to/video.mp4
```

### Ask about a video (URL)

```bash
vcli ask "Summarize this video" --video https://example.com/video.mp4
```

### Ask about an image

```bash
vcli ask "What objects are in this image?" --image /path/to/photo.jpg
```

### Transcribe or analyze audio

```bash
vcli ask "Transcribe this audio" --audio /path/to/recording.mp3
vcli ask "Summarize this lecture" --audio /path/to/lecture.mp3
```

> Audio files are automatically converted to black-screen video internally
> and sent via the video channel, as DashScope's OpenAI-compatible API
> does not support the `input_audio` format.

### Use a specific provider

```bash
vcli ask "hello" --provider qwen
```

### Configuration

```bash
# Show current config (API keys are masked)
vcli config show

# Set API key for a provider
vcli config set api_key <VALUE> --provider qwen

# List providers and their status
vcli providers
```

## Project Structure

```
video_reader/
├── pyproject.toml           # Project metadata and dependencies
├── README.md
└── src/
    └── vcli/
        ├── __init__.py
        ├── main.py          # CLI entry point (typer app)
        ├── config.py        # Config load/save from ~/.video_cli/config.yaml
        ├── logger.py        # Rotating file logger to ~/.video_cli/logs/
        ├── utils.py         # Base64 encode, MIME detection, URL check
        └── providers/
            ├── __init__.py
            ├── base.py      # Abstract BaseProvider, MediaInput, ProviderResponse
            ├── qwen.py      # QwenProvider — qwen3.5-omni-plus (full implementation)
            └── gemini.py    # GeminiProvider stub (NotImplementedError)
```

## Config File

Located at `~/.video_cli/config.yaml`, auto-created on first run:

```yaml
default_provider: qwen
providers:
  qwen:
    api_key: ""
    base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    model: "qwen3.5-omni-plus"
  gemini:
    api_key: ""
    base_url: ""
    model: "gemini-2.5-pro"
```

## Logs

Logs are written to `~/.video_cli/logs/vcli.log`:
- **DEBUG** level and above in the file (rotating, max 10 MB, 5 backups)
- **ERROR** level only on the console

## Adding a New Provider

1. Create `src/vcli/providers/<name>.py` implementing `BaseProvider`
2. Register it in `PROVIDERS` dict in `main.py`
3. Add its config block in `config.py`'s `DEFAULT_CONFIG`