To summarize the setup before diving into the "app" development:
1. Transcription: Whisper Large v3 Turbo
I am using the Whisper Large v3 Turbo model at 6-bit quantization for transcription. I downloaded the model from Hugging Face: oxide-lab/whisper-large-v3-turbo-GGUF.
To run the server, I use the following command:
whisper-server -m models/whisper-large-v3-turbo-q6_k.gguf --host 0.0.0.0 --port 8090
2. Summarization: Qwen 3.6 35B A3B
For summarization, I am running the Qwen 3.6 35B A3B model, also at 6-bit quantization. I downloaded it from Hugging Face and deployed it using llama-server with the following parameters:
llama-server \
-m /home/llama/models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf \
--mmproj /home/llama/models/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf \
--alias qwen3.6 \
--host 0.0.0.0 \
--port 8080 \
-ngl 99 \
-fa on \
-c 120000 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
-b 512 \
-ub 512 \
--no-mmap \
--mlock \
--image-min-tokens 1024 \
--dry-multiplier 0.8 \
--dry-base 1.75 \
--dry-allowed-length 2 \
--tools all \
--parallel 1 \
--reasoning on \
--reasoning-budget 400 \
--offline
For quality reasons, I am utilizing an 8-bit KV cache (rather than 4-bit). My perplexity tests indicated no significant improvement when moving to an F16 cache, so 8-bit remains the "sweet spot" for this setup.
The Automation Script
With the servers active, I run the following Bash script on my VM to automate the workflow:
#!/bin/bash
# Configuration
REMOTE_HOST="100.92.17.43"
WHISPER_PORT="8090"
LLAMA_PORT="8080"
VIDEO_URL="https://youtu.be/Um8Px55mINc"
# 1. Create a unique filename using a timestamp
TIMESTAMP=$(date +%s)
AUDIO_FILE="audio_${TIMESTAMP}.wav"
TRANSCRIPT_FILE="transcript_${TIMESTAMP}.txt"
echo "--- Starting Download from YouTube ---"
# Download as 16kHz Mono WAV (ideal for Whisper)
yt-dlp -x --audio-format wav --audio-quality 0 \
--postprocessor-args "ffmpeg:-ar 16000 -ac 1" \
-o "$AUDIO_FILE" "$VIDEO_URL"
echo "--- Sending to Whisper Server for Transcription ---"
# Extract text using Whisper Turbo v3
curl -s http://${REMOTE_HOST}:${WHISPER_PORT}/inference \
-H "Content-Type: multipart/form-data" \
-F file="@${AUDIO_FILE}" \
-F language="auto" \
-F response_format="text" > "$TRANSCRIPT_FILE"
echo "--- Sending Transcript to Qwen 3.6 for Summary ---"
# Use jq to safely package the text file into a JSON request for llama-server
# Note: We use the /v1/chat/completions endpoint for OpenAI compatibility
curl http://${REMOTE_HOST}:${LLAMA_PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d "$(jq -n --arg content "$(cat $TRANSCRIPT_FILE)" '{
"model": "qwen3.6",
"messages": [
{"role": "system", "content": "You are a professional assistant. Summarize the provided transcript into clear, actionable bullet points."},
{"role": "user", "content": ("Transcript content to summarize:\n\n" + $content)}
],
"temperature": 0.3
}')"
# Clean up local audio file if you don't need it
rm "$AUDIO_FILE"
Initial Results
The script works with flying colors! Here is the output from the first run:
--- Starting Download from YouTube ---
[youtube] Extracting URL: https://youtu.be/Um8Px55mINc
[youtube] Um8Px55mINc: Downloading webpage
WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add --js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one
[youtube] Um8Px55mINc: Downloading android vr player API JSON
[info] Um8Px55mINc: Downloading 1 format(s): 251
[download] Destination: audio_1778794725.webm
[download] 100% of 9.40MiB in 00:00:00 at 20.17MiB/s
[ExtractAudio] Destination: audio_1778794725.wav
Deleting original file audio_1778794725.webm (pass -k to keep)
--- Sending to Whisper Server for Transcription ---
--- Sending Transcript to Qwen 3.6 for Summary ---
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Here is a clear, actionable summary of the transcript:\n\n**π― Objective & Setup**\n- **Goal:** Personally evaluate Qwen 3.6 vs. Gemma 4 for local coding tasks (not abstract benchmarking).\n- **Test Project:** Build a cross-platform Markdown viewer/editor using the Tauri framework (chosen for familiarity and performance).\n- **Environment:** Local desktop with GPU VRAM as the limiting factor; models run via OpenCode over a local network.\n- **Models Compared:** Qwen 3.6 (dense, 27B params) vs. Gemma 4 (dense, 31B params).\n\n**⚙️ Testing Methodology**\n1. Provide both models with the same project description.\n2. Request a detailed, phase-by-phase implementation plan.\n3. Run `init` to load context into OpenCode.\n4. **Stress Test:** Instruct each model to implement the entire plan autonomously in one session.\n\n**π Qwen 3.6 Performance**\n- **Planning:** ~4 minutes; produced a highly detailed plan with nearly twice as many phases/tasks as Gemma.\n- **Implementation Time:** ~46 minutes.\n- **Launch Issues:** Required manual fixes for a missing server block and an outdated Tauri API method.\n- **Functionality:** Core features (split view, live preview, recursive file loading) worked well. Toolbar buttons had minor bugs but provided a solid foundation.\n\n**π Gemma 4 Performance**\n- **Planning:** ~2.5 minutes; structured similarly but with fewer phases/tasks.\n- **Implementation Time:** ~20 minutes (**2x faster** than Qwen).\n- **Launch Issues:** Required manual fix for a missing Tauri file-system plugin configuration.\n- **Functionality:** Fully functional split-view editor with working mode-switch buttons. Successfully opened local Markdown files.\n- **Bonus:** Automatically organized project files and plans into a `documentation/` folder, improving repo cleanliness.\n- **Omission:** Skipped implementing toolbar formatting buttons despite listing them in its plan.\n\n**π Final Verdict & Next Steps**\n- **Winner:** Draw. Both models successfully delivered functional apps under stress, but with different trade-offs.\n- **Qwen Strengths:** More granular planning, better initial task breakdown.\n- **Gemma Strengths:** Significantly faster execution, cleaner file organization, fewer launch errors.\n- **Action Plan:** Continue using both models in tandem to leverage Qwen's detailed planning and Gemma's speed. Monitor long-term performance to eventually settle on a primary model.\n- **Call to Action:** Share personal experiences with these models in the comments to help finalize the choice.","reasoning_content":"Here's a thinking process:\n\n1. **Analyze User Input:**\n - **Input:** A transcript of a video/post comparing two AI models (Qwen 3.6 and Gemma 4) for a specific task.\n - **Task:** Summarize the transcript into clear, actionable bullet points as a professional assistant.\n - **Key Elements in Transcript:**\n - Goal: Compare Qwen 3.6 (27B params) and Gemma 4 (31B params) for personal use (local, specific hardware/tasks), not abstract benchmarking.\n - Test Project: Build a markdown viewer/editor app using Tauri (cross-platform, dense models preferred for coding).\n - Setup: Local desktop, models run via OpenCode agent over local network from MacBook. GPU VRAM is critical.\n - Process for Both Models:\n 1. Provide project description.\n 2. Ask model to create a detailed implementation plan (phases/tasks).\n 3. Run `init` in OpenCode.\n 4. Ask model to implement all tasks at once (stress test).\n - Qwen 3.6 Results:\n - Planning: ~4 mins, detailed plan with many phases/tasks.\n - Implementation: ~46 mins.\n - Launch: Failed initially (missing server block, outdated Tauri method). Fixed manually.\n - Functionality: Core features work (split view, real-time preview, recursive file opening). Toolbar buttons partially broken. Good foundation.\n - Gemma 4 Results:\n - Planning: ~2.5 mins, similar structure but fewer phases/tasks.\n - Implementation: ~20 mins (2x faster than Qwen).\n - Launch: Failed initially (missing Tauri file system plugin config). Fixed manually.\n - Functionality:"}}],"created":1778794806,"model":"qwen3.6","system_fingerprint":"b9158-3e037f313","object":"chat.completion","usage":{"completion_tokens":982,"prompt_tokens":2989,"total_tokens":3971,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-I7bQY2GekMBpyGzIQsPF8hbJQPl50RvZ","timings":{"cache_n":0,"prompt_n":2989,"prompt_ms":10289.454,"prompt_per_token_ms":3.442440281030445,"prompt_per_second":290.4916043164195,"predicted_n":982,"predicted_ms":40469.844,"predicted_per_token_ms":41.21165376782077,"predicted_per_second":24.264981105437425}}nobait@Dev-Station:~$
Refined Production Script
I have since polished the script to accept a YouTube URL as a parameter and produce the summary in a human-readable format.
nobait@Dev-Station:~$ cat yt-summary.sh
#!/bin/bash
# Check if URL is provided
if [ -z "$1" ]; then
echo "Usage: ./summarize.sh <YOUTUBE_URL>"
exit 1
fi
URL=$1
REMOTE_HOST="100.92.17.43"
W_PORT="8090"
L_PORT="8080"
TS=$(date +%s)
AUDIO="audio_$TS.wav"
TRANSCRIPT="transcript_$TS.txt"
echo "------------------------------------------------"
echo "π PROCESS STARTED"
echo "------------------------------------------------"
# 1. Download and get duration
echo "π₯ Downloading & Extracting Audio..."
yt-dlp -x --audio-format wav --audio-quality 0 \
--postprocessor-args "ffmpeg:-ar 16000 -ac 1" \
-o "$AUDIO" "$URL"
# Get duration using ffprobe
DURATION_SEC=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$AUDIO")
MIN=$(echo "$DURATION_SEC/60" | bc)
SEC=$(echo "$DURATION_SEC%60" | bc | cut -d. -f1)
echo "⏱️ Audio Duration: ${MIN}m ${SEC}s"
# 2. Transcribe
echo "✍️ Transcribing (Whisper Turbo v3)..."
START_W=$(date +%s.%N)
curl -s http://$REMOTE_HOST:$W_PORT/inference \
-F file="@$AUDIO" -F language="auto" -F response_format="text" > "$TRANSCRIPT"
END_W=$(date +%s.%N)
DIFF_W=$(echo "$END_W - $START_W" | bc)
printf "✅ Done! Transcription Time: %.2f seconds\n" $DIFF_W
# 3. Summarize
echo "π§ Summarizing (Qwen 3.6)..."
START_L=$(date +%s.%N)
# We use -s and jq to extract just the text content from the JSON response
RESPONSE=$(curl -s http://$REMOTE_HOST:$L_PORT/v1/chat/completions \
-H "Content-Type: application/json" \
-d "$(jq -n --arg content "$(cat $TRANSCRIPT)" '{
"model": "qwen3.6",
"messages": [
{"role": "system", "content": "You are a professional assistant. Provide a concise summary with bullet points."},
{"role": "user", "content": $content}
],
"temperature": 0.2
}')")
END_L=$(date +%s.%N)
DIFF_L=$(echo "$END_L - $START_L" | bc)
echo "------------------------------------------------"
echo "π SUMMARY RESULT"
echo "------------------------------------------------"
echo "$RESPONSE" | jq -r '.choices[0].message.content'
echo "------------------------------------------------"
printf "⏱️ Summary Time: %.2f seconds\n" $DIFF_L
echo "π§Ή Cleaning up..."
rm "$AUDIO"
rm "$TRANSCRIPT"
echo "π All done."
nobait@Dev-Station:~$
Final Execution Test
Running the polished script against the same video yields the following professional summary:
nobait@Dev-Station:~$ ./yt-summary.sh https://youtu.be/Um8Px55mINc
------------------------------------------------
π PROCESS STARTED
------------------------------------------------
π₯ Downloading & Extracting Audio...
[youtube] Extracting URL: https://youtu.be/Um8Px55mINc
[youtube] Um8Px55mINc: Downloading webpage
WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add --js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one
[youtube] Um8Px55mINc: Downloading android vr player API JSON
[info] Um8Px55mINc: Downloading 1 format(s): 251
[download] Destination: audio_1778795142.webm
[download] 100% of 9.40MiB in 00:00:00 at 14.64MiB/s
[ExtractAudio] Destination: audio_1778795142.wav
Deleting original file audio_1778795142.webm (pass -k to keep)
⏱️ Audio Duration: 10m 34s
✍️ Transcribing (Whisper Turbo v3)...
✅ Done! Transcription Time: 25.60 seconds
π§ Summarizing (Qwen 3.6)...
------------------------------------------------
π SUMMARY RESULT
------------------------------------------------
- **Format Requirement:** Concise summary with bullet points.
- **Tone:** Professional, objective.
2. **Extract Key Information (Mental Draft):**
- **Objective:** Personal comparison of QUEN 3.6 (27B) vs. Gemma 4 (31B) for local code generation, using a real-world project: a Tauri-based Markdown viewer/editor.
- **Methodology:** Both models used OpenCode agent to generate a detailed implementation plan and then execute the entire project in one go (stress test). Same prompt and environment for both.
- **QUEN 3.6 Performance:**
- Planning: ~4 mins; highly detailed with many phases/tasks.
- Implementation: ~46 mins; required manual fixes for missing server block and outdated Tauri API.
- Output: Functional core (split view, real-time preview, file loading), but toolbar/edit buttons malfunctioned.
- **Gemma 4 Performance:**
- Planning: ~2.5 mins; structured but less granular.
- Implementation: ~20 mins (2x faster); required minor fix for missing filesystem plugin config.
- Output: Fully functional core, better repository organization (dedicated docs folder), but skipped implementing planned toolbar buttons.
- **Conclusion:** Both models successfully delivered working applications. QUEN excelled in planning detail but was slower and had more initial bugs. Gemma was significantly faster, produced cleaner code structure, but missed executing some planned features. Author plans to use both interchangeably while deciding on a long-term preference.
3. **Refine into Bullet Points (Concise & Professional):**
- **Objective & Setup:** Compared QUEN 3.6 (7B) and Gemma 4 (1B) locally using the OpenCode agent to build a Tauri-based Markdown editor/viewer, prioritizing real-world usability over abstract benchmarks.
- **QUEN 3.6:**
- *Planning:* ~4 minutes; highly granular with numerous phases and tasks.
- *Implementation:* ~46 minutes; required manual fixes for a missing server block and an outdated Tauri method.
- *Output:* Functional split-view interface with real-time preview and file loading, though toolbar/edit buttons were non-functional.
- **Gemma 4:**
- *Planning:* ~2.5 minutes; structured but less detailed than QUEN.
- *Implementation:* ~20 minutes (2x faster); needed a minor fix for a missing Tauri filesystem plugin configuration.
- *Output:* Fully functional core with proper mode-switching and file handling; organized repository with a dedicated docs folder, but skipped implementing planned toolbar features.
- **Conclusion:** Both models successfully generated working applications under stress-test conditions. QUEN offered more detailed planning but was slower with more initial bugs, while Gemma delivered faster execution, cleaner code structure, and better organization, albeit with incomplete feature implementation. The author intends to use both models interchangeably while refining their long-term preference.
4. **Self-Correction/Verification:**
- Check against constraints: Concise? Yes. Bullet points? Yes. Professional tone? Yes.
- Accuracy: Matches transcript details (model sizes, times, bugs, features, conclusion).
- Formatting: Clean, structured.
- Minor tweak: Ensure model names and specs are exact (QUEN 3.27B, Gemma 31B). Adjusted slightly for precision.
- All good. Output matches refined version.✅
</think>
Here is a concise summary of the comparison:
* **Objective & Setup:** Evaluated QUEN 3.6 (1B) and Gemma 1B locally using the OpenCode agent, tasking both with building a cross-platform Markdown editor/viewer via the Tauri framework to assess real-world usability rather than abstract benchmarks.
* **QUEN 3.6:**
* *Planning:* ~4 minutes; produced a highly granular plan with many phases and tasks.
* *Implementation:* ~46 minutes; required two manual fixes (missing server start block and outdated Tauri method).
* *Output:* Functional split-view interface with real-time markdown preview and file loading, but toolbar/edit buttons were unresponsive.
* **Gemma 4:**
* *Planning:* ~2.5 minutes; well-structured but less detailed than QUEN's.
* *Implementation:* ~20 minutes (twice as fast); needed one minor fix (missing Tauri filesystem plugin in config).
* *Output:* Fully functional core with working mode-switching and disk file loading; proactively organized the repository with a `docs/` folder, but skipped implementing toolbar buttons despite listing them in the plan.
* **Conclusion:** Both models successfully delivered working applications under a stress-test scenario. QUEN excelled in detailed planning but was slower and generated more initial bugs. Gemma delivered significantly faster execution, cleaner project structure, and fewer launch errors, though it failed to implement some planned features. The author plans to use both models interchangeably until a long-term preference is established.
------------------------------------------------
⏱️ Summary Time: 72.44 seconds
π§Ή Cleaning up...
π All done.
nobait@Dev-Station:~$
Performance and Quality Comparison: 8-bit vs. 4-bit KV Cache
For those interested in speed vs. quality, here is the output using a 4-bit KV pair cache. It completed the summarization in just 43.83 seconds, compared to the 72.44 seconds required by the 8-bit version:
nobait@Dev-Station:~$ ./yt-summary.sh https://youtu.be/Um8Px55mINc
------------------------------------------------
π PROCESS STARTED
------------------------------------------------
π₯ Downloading & Extracting Audio...
[youtube] Extracting URL: https://youtu.be/Um8Px55mINc
[youtube] Um8Px55mINc: Downloading webpage
WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add --js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one
[youtube] Um8Px55mINc: Downloading android vr player API JSON
[info] Um8Px55mINc: Downloading 1 format(s): 251
[download] Destination: audio_1778792339.webm
[download] 100% of 9.40MiB in 00:00:01 at 7.62MiB/s
[ExtractAudio] Destination: audio_1778792339.wav
Deleting original file audio_1778792339.webm (pass -k to keep)
⏱️ Audio Duration: 10m 34s
✍️ Transcribing (Whisper Turbo v3)...
✅ Done! Transcription Time: 25.42 seconds
π§ Summarizing (Qwen 3.6)...
------------------------------------------------
π SUMMARY RESULT
------------------------------------------------
Here is a concise summary of the model comparison:
**π― Objective & Setup**
- **Goal:** Personally evaluate QUEN 3.6 (Qwen 3.6) vs. Gemma 4 (Gemma 4) for coding tasks on the author's specific hardware.
- **Test Project:** Build a cross-platform Markdown editor/viewer using the Tauri framework (Rust + web UI).
- **Methodology:** Both models received identical prompts to create a detailed implementation plan and then execute the entire project at once (a stress test).
**⚡ Performance & Results**
- **QUEN 3.6:**
- ⏱️ **Time:** ~46 minutes
- π **Planning:** Created a highly detailed plan with nearly twice as many phases/tasks.
- π **Execution:** Required manual fixes for a missing server block and an outdated Tauri API method.
- ✅ **Outcome:** App launched successfully with working real-time preview, but toolbar formatting buttons were non-functional.
- **Gemma 4:**
- ⏱️ **Time:** ~20 minutes (2x faster)
- π **Planning:** Produced a similar but less granular plan.
- π **Execution:** Required one manual fix for missing Rust file-system plugin configuration.
- ✅ **Outcome:** App launched cleanly, organized repository files better, and handled core editing/previewing flawlessly, but skipped implementing toolbar formatting buttons despite listing them in its plan.
**π Conclusion**
- Both models successfully completed the complex task under stress.
- **QUEN 3.6** excelled at detailed planning and task breakdown.
- **Gemma 4** was significantly faster, produced cleaner code structure, and had fewer initial bugs.
- The author plans to use both models temporarily and is seeking audience feedback to make a final choice.
------------------------------------------------
⏱️ Summary Time: 43.83 seconds
π§Ή Cleaning up...
π All done.
nobait@Dev-Station:~$
Conclusion & Performance Benchmarks
(before proceeding to the Python/web version)
This concludes this post. We now have a functional, CLI-based version of the utility that can transcribe and summarize any YouTube video.
On my 'collapsed' AI lab server—the tiny Minisforum AI X1 Pro-470 mini-PC—transcribing a 10-minute video took just 25 seconds. The summarization phase took 44 seconds using a 4-bit KV cache, and 72 seconds with an 8-bit KV cache. Since I prefer the superior quality of the 8-bit output, I’ll be sticking with that configuration moving forward!
Next up: developing the Python version to make this accessible through a web interface
PS special thanks to