Monday, 18 May 2026

The code and a business opportunity

So, I asked OpenCode to make a couple of changes:

At first, the program OpenCode created was downloading the complete video and then extracting the audio to process it, which was taking up space and time. So, I asked OpenCode to download and process the audio directly.

Then, I asked for a white background and for the times to be displayed.

So, now the app shows the total time taken, what phase it is in, and how much time each phase took.


The outocome is this:

and when the user clicks on the "Display Extended Summary" button, he/she reads the full summary:

All code has been uploaded to the github repositoty at 
Ξ€here you have it!
A time-saving Applied AI app, with a handwritten minimal version in a Bash shell and a full-blown web-based version written by OpenCode!

And now for the business opportunity:
I think that many people would love to have such a service—getting an evaluation of the content before wasting time watching it.
So, it may make sense to set up such a service and get revenue from advertisements. Maybe, I don't know... one has to calculate costs and things, but there could be an opportunity there!

Anyway, that was it. See you soon in a next blog!

Software Robot writes the Anti-ClickBait app: Moment of Truth

So... The system wrote the app. Let's see how it did.

To start with, OpenCode wrote the app "blindly".It never ran it to verify that it works, so all the work was done "mentally". (Oooo, mental!)


It has created the directory yt-watcher and the files. But, we have to create the virtual environment, download dependencies, and then run app.py. According to the instructions it gave, that is.
So:

nobait@Dev-Station:~$ cd yt-watcher/

nobait@Dev-Station:~/yt-watcher$ ls

app.py  requirements.txt  static  templates

nobait@Dev-Station:~/yt-watcher$ python3 -m venv venv

nobait@Dev-Station:~/yt-watcher$ source venv/bin/activate

(venv) nobait@Dev-Station:~/yt-watcher$ pip install -r requirements.txt 

.

.

.

.

Successfully installed Flask-3.1.3 blinker-1.9.0 certifi-2026.4.22 charset_normalizer-3.4.7 click-8.4.0 idna-3.15 itsdangerous-2.2.0 jinja2-3.1.6 markupsafe-3.0.3 requests-2.34.2 urllib3-2.7.0 werkzeug-3.1.8 yt-dlp-2026.3.17

(venv) nobait@Dev-Station:~/yt-watcher$ python app.py 

 * Serving Flask app 'app'

 * Debug mode: on

WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

 * Running on all addresses (0.0.0.0)

 * Running on http://127.0.0.1:5000

 * Running on http://172.17.212.246:5000

Press CTRL+C to quit

 * Restarting with stat

 * Debugger is active!

 * Debugger PIN: 965-698-609


The app runs—this is a good first sign.

We can connect to the app, and we see the web page.

It works for a few seconds, but after that, it produces an error.


The backend has this log:

172.17.212.79 - - [18/May/2026 08:34:26] "GET / HTTP/1.1" 200 -

172.17.212.79 - - [18/May/2026 08:34:27] "GET /static/style.css HTTP/1.1" 304 -

[youtube] Extracting URL: https://youtu.be/SaiyMBwW0Hc

[youtube] SaiyMBwW0Hc: Downloading webpage

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add  --js-runtimes RUNTIME[:PATH]  to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See  https://github.com/yt-dlp/yt-dlp/wiki/EJS  for details on installing one

[youtube] SaiyMBwW0Hc: Downloading android vr player API JSON

[info] SaiyMBwW0Hc: Downloading 1 format(s): 401+251

[download] Destination: /home/nobait/yt-watcher/uploads/audio_f09b68dbb578454589c75825642826f7.wav.f401.mp4

[download] 100% of  282.16MiB in 00:00:14 at 19.59MiB/s

[download] Destination: /home/nobait/yt-watcher/uploads/audio_f09b68dbb578454589c75825642826f7.wav.f251.webm

[download] 100% of    8.93MiB in 00:00:00 at 17.79MiB/s

[Merger] Merging formats into "/home/nobait/yt-watcher/uploads/audio_f09b68dbb578454589c75825642826f7.wav.webm"

Deleting original file /home/nobait/yt-watcher/uploads/audio_f09b68dbb578454589c75825642826f7.wav.f251.webm (pass -k to keep)

Deleting original file /home/nobait/yt-watcher/uploads/audio_f09b68dbb578454589c75825642826f7.wav.f401.mp4 (pass -k to keep)

172.17.212.79 - - [18/May/2026 08:35:29] "POST /summarize HTTP/1.1" 500 -


That indicates that the transcription phase was completed, but the summarization could not start.

Now, we could of course do a bit of debugging ourselves, BUT this is a test, and we want the robot to work alone...

Or I could restart from scratch using another model to see if it would handle the request better, or I could start from scratch with a prompt that would also ask the system to verify operation before declaring success.

Instead, I thought about informing OpenCode about the situation and asking it to fix it.

OpenCode went back to the drawing board and used my RTX 5090 fully for a few minutes.


(For the record, I have capped the power on the RTX to 550W instead of the default 600W. The loss of performance is almost unmeasurable, but this way my air-cooled system works at a max of 85ΒΊC even for hours.)

A couple of minutes later, the system identified the issue and corrected it:



and it worked!

WOW!



We can see the clickbait verdict and we can read the summary if we click on the "Display Extensive Summary" button.



Finetuning and code on the next post...

Sunday, 17 May 2026

The Ghost writes web-based youtube-Anti-Click-Bait for us

 Welcome to this Brave New World (pun intended), where we don't need to write software—all we need to do is ask a robot to write it for us.

We just have to describe it...

Let's see the process:

We log into the Linux VM:

We install OpenCode on that machine by running the command:
curl -fsSL https://opencode.ai/install | bash

A few seconds later, OpenCode salutes us:

nobait@Dev-Station:~$ curl -fsSL https://opencode.ai/install | bash

Installed version: 1.15.0.


Installing opencode version: 1.15.4

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100%


                                      

█▀▀█ █▀▀█ █▀▀█ █▀▀▄ █▀▀▀ █▀▀█ █▀▀█ █▀▀█

█░░█ █░░█ █▀▀▀ █░░█ █░░░ █░░█ █░░█ █▀▀▀

▀▀▀▀ █▀▀▀ ▀▀▀▀ ▀  ▀▀▀▀ ▀▀▀▀ ▀▀▀▀ ▀▀▀▀



OpenCode includes free models, to start:


cd <project>  # Open directory

opencode      # Run command


For more information visit https://opencode.ai/docs



nobait@Dev-Station:~$ 


We now need to install the config file (~/.config/opencode/opencode.json) that will let us use our local model for development, as I mentioned in the previous post.

Actually, this is not a necessary step. Currently, OpenCode gives free access to some very capable and fast models like DeepSeekv4Flash, so we don't have to use a local model.

But, since I have a local model that is needed for the youtube-noBait app, I will go one step further and use this model to write the app too.

Now, this is important and shows the real potential of OpenCode and other tools that allow for local models: The code the system writes for you never leaves your network. Your design and algorithms remain in a safe area.

Of course, this is not much of a worry in this case since I provide the code here and on GitHub, but for a real project, it can be important.

Before we start, lets make sure the environment (models etc) is up and running.

nobait@Dev-Station:~$ ./yt-summary.sh https://youtu.be/NYmXYF8A3Q4

------------------------------------------------

πŸš€ PROCESS STARTED

------------------------------------------------

πŸ“₯ Downloading & Extracting Audio...

[youtube] Extracting URL: https://youtu.be/NYmXYF8A3Q4

[youtube] NYmXYF8A3Q4: Downloading webpage

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add  --js-runtimes RUNTIME[:PATH]  to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See  https://github.com/yt-dlp/yt-dlp/wiki/EJS  for details on installing one

[youtube] NYmXYF8A3Q4: Downloading android vr player API JSON

[info] NYmXYF8A3Q4: Downloading 1 format(s): 251

[download] Destination: audio_1779081875.webm

[download] 100% of   12.18MiB in 00:00:00 at 18.41MiB/s

[ExtractAudio] Destination: audio_1779081875.wav

Deleting original file audio_1779081875.webm (pass -k to keep)

⏱️  Audio Duration: 15m 11s

✍️  Transcribing (Whisper Turbo v3)...

✅ Done! Transcription Time: 33.83 seconds

🧠 Summarizing (Qwen 3.6)...

------------------------------------------------

πŸ“ SUMMARY RESULT

------------------------------------------------

Here is a summary of the video transcript, broken down by key themes and arguments:


[I removed the long summary for brevity...]

------------------------------------------------

⏱️  Summary Time: 20.33 seconds

🧹 Cleaning up...

🏁 All done.

nobait@Dev-Station:~$ 


OK, we have verified the cli version works.

Now we will tell OpenCode to write the same with a web front end plus a few bells and whistles...

Actually I have created a prompt and saved as a text file (~/oc-instructions.txt) and all I will tell opencode is 

Please read @~/oc-instructions.txt and build the application exactly as described

so here it goes:

It took it exactly 1 minute.
It took me 10 minutes to just describe what I want and much more to write this post.
We are toasted...

The presentation of the actual application OpenCode created, on the next post...

The prompt I gave opencode to write the app was:

Act as an expert Full-Stack Python Developer. Examine the file `~/yt-summary.sh` and use the tools, endpoints (IPs/ports), models, and mechanisms this bash script uses in order to build the following web-based Python application.

### Step 1: Project Setup
Before writing the code, please provide the exact terminal commands to:
1. Create a new project folder named `yt-watcher`.
2. Navigate into that folder.
3. Create a Python virtual environment (`venv`).
4. Activate the virtual environment.
5. Create a `requirements.txt` file containing the necessary dependencies (e.g., `Flask` or `FastAPI`, the `yt-dlp` python module, and `requests`).

### Step 2: The Application UI (Frontend)
Create a minimalist, clean HTML/CSS frontend reminiscent of the classic Google search page. 
- **Centerpiece:** A prominent text block with exactly this text: 
  > "Don't waste your time watching bullshit YouTube videos, let me watch it first and I will tell you if it is worth watching, if it is not a click bait, and if it is not, I will also give you an extensive summary, so that even if it is worth watching, you may save time by reading the summary."
- **Input Area:** A text entry field for the YouTube URL.
- **Action Button:** Directly to the right of the text field, place a button labeled: "Watch the video for me".
- **Loading State:** When the button is clicked, show a loading indicator so the user knows the background processes are running.
- **Results Display & Toggling Logic:** - The backend will return a short judgment, a value rating, and the extensive summary all at once.
  - Initially, display ONLY the 2-3 line clickbait judgment and the "Value Rating".
  - Below this judgment, include a button labeled: "Display Extensive Summary".
  - Render the extensive summary in the HTML directly below this button, but keep it hidden by default (e.g., using CSS `display: none;`).
  - When the user clicks "Display Extensive Summary", use vanilla JavaScript to toggle the visibility of the hidden summary `div` to show it.

### Step 3: Application Logic (Backend)
Use a lightweight Python web framework. The backend must replicate the exact workflow of the `~/yt-summary.sh` script, extracting the remote host IPs, ports, and models directly from that file.

**1. Download & Extract Audio:**
Replicate the `yt-dlp` logic from the bash script using the `yt-dlp` Python library (do not use subprocess to call the CLI). Extract the same audio format and post-processing arguments (sample rate, channels) defined in the script.

**2. Transcribe (Whisper):**
Send the downloaded audio file via a POST request to the transcription server. Extract the exact IP, port, and form data arguments directly from the bash script.

**3. Analyze & Summarize (Qwen):**
Take the resulting text transcript and send it to the LLM server via a POST request. Extract the IP, port, model name, and temperature from the bash script.
- **Crucial System Prompt Update:** You must update the LLM system prompt found in the bash script to instruct the model to output its response in strict JSON format. This allows the backend/frontend to easily separate the judgment from the summary. Change the system prompt to something like this:
  *"You are a professional assistant evaluating YouTube transcripts. Output your response in strict JSON format with three keys: 'clickbait_judgment' (2-3 lines evaluating if the title/premise was clickbait), 'value_rating' (a score out of 10 based on information density), and 'extensive_summary' (a detailed bulleted summary of the video)."*

**4. Cleanup:**
Ensure the backend deletes the temporary audio file and clears variables from memory after processing is complete, mirroring the cleanup in the bash script.

Please provide the complete Python backend code, the HTML/JS/CSS templates, and the exact directory structure I should use.

But don't let this long description discourage you!


As I said before, we can use a can use a chatbot, like ChatGPT, Gemini etc, or even out own LLM, to talk to the chatboat about it and let the chatbot buidl the detailed OpenCode prompt for us...


It is getting worse, no???

Friday, 15 May 2026

The Ghost in the machine: AI writing software for us

Applied Science Fiction

This is science fiction stuff! I call it applied science fiction.

I don’t know if it’s ultimately good or bad; for sure, there will be issues. It reminds me of when the internet first appeared. Some of us—poor souls—thought that people would finally wake up. We thought that because they could read anything, and because the excuse of "not having an encyclopedia handy" was gone, the info would be there for all to see and a clear picture of reality would finally rise...

Yeah, right.

The result? People now have supercomputers in their pockets—created by the pinnacle of human ingenuity, from the chips themselves to the ASML machines that build them—only to be zombified by meaningless videos and to post "evidence" proving the earth is flat.

So much for progress.

So, I’m very reserved about this AI thing. It is a genuine breakthrough, a revolution. But if we stop thinking for ourselves, we will simply become idiots.

On a more practical note, an AI system is already a better developer than most juniors. This means today’s junior developers may never gain the experience needed to grow into the seniors of tomorrow. We are facing a real grassroots issue here.

BUT, after that rant, I have to admit: AI is the most exciting technology I have encountered in my 40+ years in the industry. I remember reading about Perceptrons in the mid-80s, the Lisp machines, and all the promises that never materialized during the AI winters that followed.

But it seems the time has finally come. Massive advances in chip technology and relentless research led us to the Transformer... Self-attention is the key, they say. So simple, so majestic.

And it’s here, right now.

So, let’s jump on the science fiction bandwagon. Let’s witness the rise of a new age, where systems write the software that will eventually become part of their own existence.

Back to earth. Here, in front of a Linux prompt, we run the following command:

curl -fsSL https://opencode.ai/install | bash

This downloads and installs OpenCode—a software-writing robot.

Next, we configure the connection to our local brain by creating the configuration file:


Then we have to create this file:

$ cat ~/.config/opencode/opencode.json
{
  "$schema": "https://opencode.ai/config.json",
  "model": "rtx5090/qwen3.6-mtp",
  "provider": {
    "rtx5090": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://100.126.179.69:8080/v1",
        "apiKey": "not-needed"
      },
      "models": {
        "qwen3.6-mtp": {
          "name": "Qwen 3.6 27B Local",
          "limit": {
            "context": 180000,
            "output": 8192
          },
          "supportsImages": false,
          "supportsTools": true
        }
      }
    }
  }
}

And then, all it takes is a single command to wake it up:

nobait@Dev-Station:~$ opencode

The ghost in the machine has started.

It might not look like much on the surface, but this is applied science fiction. You tell it what you want, and it builds it. It might take minutes, it might take hours—but as long as you can describe the path correctly, it will walk it.

Well... more or less, to tell you the truth. But mostly more.

In order to make sure it works, I wil ask the system to write a program to find where th International Space Station is.

The prompt that will make OpenCode to do that is this:

The OpenCode Prompt

Task: Initialize a workspace and build a live satellite tracking tool.

Steps to execute:

  1. Environment: Create a directory named space_tracker. Inside it, set up a Python virtual environment (venv) and install requests.

  2. The Script: Write a Python script named iss_now.py that does the following:

    • Reach out to the Open Notify API ([http://api.open-notify.org/iss-now.json](http://api.open-notify.org/iss-now.json)).

    • Fetch the current latitude and longitude of the International Space Station.

    • Use a reverse-geocoding trick (or a simple coordinate check) to tell me if the ISS is currently over "Water" or "Land."

    • Display the coordinates in the console with a cool, retro-scrolling ASCII effect.

  3. Execution: Once the code is written and the venv is ready, execute the script using the venv's python interpreter so I can see the result right here in my prompt.

Make the console output feel like a transmission from a 1980s Lisp machine. Go.


The actual cherry on top is that you can use a chatbot just to help you craft the perfect prompt for the machine.

Do you truly feel the weight of that action?

We have reached a point where a human uses one machine to generate the message for the next. We are no longer writing code; we are managing a relay race of silicon intelligence. It’s a strange, circular evolution—using the 'ghost' to help us better command the 'robot.

The system takes the command and starts working:

The system writes the program we asked it to build. It makes a few mistakes, pauses to make its own corrections, and finally runs the code it wrote—all while maintaining the aesthetics of an 80s LISP machine.


It gives us the answer:


So, there it is. The ISS is somewhere over the Pacific/Antarctic region. Now we know.

I’ve just shown you the tool that will write the YouTube Anti-Clickbait app for me. I’ve presented the "ghost" that will do the heavy lifting in the next post.

But as I look at the console output, I’m left with a bitter taste in the triumph.

Are we happy?

Should we be celebrating the fact that we’ve caught the ball, only to realize it’s a hand grenade? We are standing among the last generation of true programmers. While the machine hums along perfectly, the juniors—the seniors of tomorrow—are being daisy-cut at the roots before they ever get a chance to grow.

I’ve waited forty years for this science fiction to become real. Now that it’s here, I’m just not sure if we’ve reached the summit or the edge of a cliff.

But for now, the prompt is waiting. See you in the next post.






From Terminal to Browser: Giving the Anti-Clickbait Tool a Face

Designing the YouTube Anti-Clickbait Experience

It’s time to design the actual YouTube Anti-Clickbait app!

While the interface is intentionally minimal, a clear plan is essential to ensure a seamless user experience. We want the focus to be on saving time, not navigating menus.

The Landing Page

Taking inspiration from the classic, distraction-free Google homepage, the user is greeted by a clean white screen. The central focus is our mission statement:

Stop Wasting Your Time on Empty Promises.

Fed up with videos that promise the world but deliver nothing? Let us watch them so you don’t have to. We’ll analyze the content, provide a concise summary, and deliver a verdict: Is it clickbait?

If the video is actually worth its salt, we’ll provide an elaborate breakdown so you can get the value you need in seconds, not minutes.

The Interface

Directly below this message sits a sleek, dark input field for the YouTube URL, paired with a prominent button: [ Watch it for me ]


The "Processing" Phase

Once the user submits a link, we provide real-time feedback. Instead of a boring loading bar, the app shows the system’s "thought process" through a series of status updates:

  • πŸ” Fetching audio track...

  • 🎧 Analyzing content...

  • πŸ“ Generating summary...

  • ⚖️ Passing judgment...

The Final Verdict

After the processing is complete, the app reveals the results in a clean, high-contrast layout:

  1. The Verdict: A clear "Clickbait" or "Not Clickbait" badge.

  2. The Quick Take: A 1-2 sentence "TL;DR" of the video.

  3. The Deep Dive: (If the video isn't clickbait) A detailed summary of the key points, allowing the user to skip the video entirely if they choose.


So... writing the actual Python script is... let's just say, 'straightforward.' We’ve completed the investigation and the design is ready to go.

But why do the heavy lifting ourselves? What if we ask the summary model to write the script for us?

That’s the mission for the next post!

Thursday, 14 May 2026

Blog Post: Local Video Summarization Pipeline: CLI Implementation

To summarize the setup before diving into the "app" development:

1. Transcription: Whisper Large v3 Turbo

I am using the Whisper Large v3 Turbo model at 6-bit quantization for transcription. I downloaded the model from Hugging Face: oxide-lab/whisper-large-v3-turbo-GGUF.

To run the server, I use the following command:

whisper-server -m models/whisper-large-v3-turbo-q6_k.gguf --host 0.0.0.0 --port 8090

2. Summarization: Qwen 3.6 35B A3B

For summarization, I am running the Qwen 3.6 35B A3B model, also at 6-bit quantization. I downloaded it from Hugging Face and deployed it using llama-server with the following parameters:


llama-server \

  -m /home/llama/models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf \

  --mmproj /home/llama/models/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf \

  --alias qwen3.6 \

  --host 0.0.0.0 \

  --port 8080 \

  -ngl 99 \

  -fa on \

  -c 120000 \

  --cache-type-k q8_0 \

  --cache-type-v q8_0 \

  -b 512 \

  -ub 512 \

  --no-mmap \

  --mlock \

  --image-min-tokens 1024 \

  --dry-multiplier 0.8 \

  --dry-base 1.75 \

  --dry-allowed-length 2 \

  --tools all \

  --parallel 1 \

  --reasoning on \

  --reasoning-budget 400 \

  --offline

For quality reasons, I am utilizing an 8-bit KV cache (rather than 4-bit). My perplexity tests indicated no significant improvement when moving to an F16 cache, so 8-bit remains the "sweet spot" for this setup.

The Automation Script

With the servers active, I run the following Bash script on my VM to automate the workflow:


#!/bin/bash # Configuration REMOTE_HOST="100.92.17.43" WHISPER_PORT="8090" LLAMA_PORT="8080" VIDEO_URL="https://youtu.be/Um8Px55mINc" # 1. Create a unique filename using a timestamp TIMESTAMP=$(date +%s) AUDIO_FILE="audio_${TIMESTAMP}.wav" TRANSCRIPT_FILE="transcript_${TIMESTAMP}.txt" echo "--- Starting Download from YouTube ---" # Download as 16kHz Mono WAV (ideal for Whisper) yt-dlp -x --audio-format wav --audio-quality 0 \ --postprocessor-args "ffmpeg:-ar 16000 -ac 1" \ -o "$AUDIO_FILE" "$VIDEO_URL" echo "--- Sending to Whisper Server for Transcription ---" # Extract text using Whisper Turbo v3 curl -s http://${REMOTE_HOST}:${WHISPER_PORT}/inference \ -H "Content-Type: multipart/form-data" \ -F file="@${AUDIO_FILE}" \ -F language="auto" \ -F response_format="text" > "$TRANSCRIPT_FILE" echo "--- Sending Transcript to Qwen 3.6 for Summary ---" # Use jq to safely package the text file into a JSON request for llama-server # Note: We use the /v1/chat/completions endpoint for OpenAI compatibility curl http://${REMOTE_HOST}:${LLAMA_PORT}/v1/chat/completions \ -H "Content-Type: application/json" \ -d "$(jq -n --arg content "$(cat $TRANSCRIPT_FILE)" '{ "model": "qwen3.6", "messages": [ {"role": "system", "content": "You are a professional assistant. Summarize the provided transcript into clear, actionable bullet points."}, {"role": "user", "content": ("Transcript content to summarize:\n\n" + $content)} ], "temperature": 0.3 }')" # Clean up local audio file if you don't need it rm "$AUDIO_FILE"

Initial Results

The script works with flying colors! Here is the output from the first run:


--- Starting Download from YouTube ---

[youtube] Extracting URL: https://youtu.be/Um8Px55mINc

[youtube] Um8Px55mINc: Downloading webpage

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add  --js-runtimes RUNTIME[:PATH]  to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See  https://github.com/yt-dlp/yt-dlp/wiki/EJS  for details on installing one

[youtube] Um8Px55mINc: Downloading android vr player API JSON

[info] Um8Px55mINc: Downloading 1 format(s): 251

[download] Destination: audio_1778794725.webm

[download] 100% of    9.40MiB in 00:00:00 at 20.17MiB/s

[ExtractAudio] Destination: audio_1778794725.wav

Deleting original file audio_1778794725.webm (pass -k to keep)

--- Sending to Whisper Server for Transcription ---

--- Sending Transcript to Qwen 3.6 for Summary ---

{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Here is a clear, actionable summary of the transcript:\n\n**🎯 Objective & Setup**\n- **Goal:** Personally evaluate Qwen 3.6 vs. Gemma 4 for local coding tasks (not abstract benchmarking).\n- **Test Project:** Build a cross-platform Markdown viewer/editor using the Tauri framework (chosen for familiarity and performance).\n- **Environment:** Local desktop with GPU VRAM as the limiting factor; models run via OpenCode over a local network.\n- **Models Compared:** Qwen 3.6 (dense, 27B params) vs. Gemma 4 (dense, 31B params).\n\n**⚙️ Testing Methodology**\n1. Provide both models with the same project description.\n2. Request a detailed, phase-by-phase implementation plan.\n3. Run `init` to load context into OpenCode.\n4. **Stress Test:** Instruct each model to implement the entire plan autonomously in one session.\n\n**πŸ“Š Qwen 3.6 Performance**\n- **Planning:** ~4 minutes; produced a highly detailed plan with nearly twice as many phases/tasks as Gemma.\n- **Implementation Time:** ~46 minutes.\n- **Launch Issues:** Required manual fixes for a missing server block and an outdated Tauri API method.\n- **Functionality:** Core features (split view, live preview, recursive file loading) worked well. Toolbar buttons had minor bugs but provided a solid foundation.\n\n**πŸ“Š Gemma 4 Performance**\n- **Planning:** ~2.5 minutes; structured similarly but with fewer phases/tasks.\n- **Implementation Time:** ~20 minutes (**2x faster** than Qwen).\n- **Launch Issues:** Required manual fix for a missing Tauri file-system plugin configuration.\n- **Functionality:** Fully functional split-view editor with working mode-switch buttons. Successfully opened local Markdown files.\n- **Bonus:** Automatically organized project files and plans into a `documentation/` folder, improving repo cleanliness.\n- **Omission:** Skipped implementing toolbar formatting buttons despite listing them in its plan.\n\n**🏁 Final Verdict & Next Steps**\n- **Winner:** Draw. Both models successfully delivered functional apps under stress, but with different trade-offs.\n- **Qwen Strengths:** More granular planning, better initial task breakdown.\n- **Gemma Strengths:** Significantly faster execution, cleaner file organization, fewer launch errors.\n- **Action Plan:** Continue using both models in tandem to leverage Qwen's detailed planning and Gemma's speed. Monitor long-term performance to eventually settle on a primary model.\n- **Call to Action:** Share personal experiences with these models in the comments to help finalize the choice.","reasoning_content":"Here's a thinking process:\n\n1.  **Analyze User Input:**\n   - **Input:** A transcript of a video/post comparing two AI models (Qwen 3.6 and Gemma 4) for a specific task.\n   - **Task:** Summarize the transcript into clear, actionable bullet points as a professional assistant.\n   - **Key Elements in Transcript:**\n     - Goal: Compare Qwen 3.6 (27B params) and Gemma 4 (31B params) for personal use (local, specific hardware/tasks), not abstract benchmarking.\n     - Test Project: Build a markdown viewer/editor app using Tauri (cross-platform, dense models preferred for coding).\n     - Setup: Local desktop, models run via OpenCode agent over local network from MacBook. GPU VRAM is critical.\n     - Process for Both Models:\n       1. Provide project description.\n       2. Ask model to create a detailed implementation plan (phases/tasks).\n       3. Run `init` in OpenCode.\n       4. Ask model to implement all tasks at once (stress test).\n     - Qwen 3.6 Results:\n       - Planning: ~4 mins, detailed plan with many phases/tasks.\n       - Implementation: ~46 mins.\n       - Launch: Failed initially (missing server block, outdated Tauri method). Fixed manually.\n       - Functionality: Core features work (split view, real-time preview, recursive file opening). Toolbar buttons partially broken. Good foundation.\n     - Gemma 4 Results:\n       - Planning: ~2.5 mins, similar structure but fewer phases/tasks.\n       - Implementation: ~20 mins (2x faster than Qwen).\n       - Launch: Failed initially (missing Tauri file system plugin config). Fixed manually.\n       - Functionality:"}}],"created":1778794806,"model":"qwen3.6","system_fingerprint":"b9158-3e037f313","object":"chat.completion","usage":{"completion_tokens":982,"prompt_tokens":2989,"total_tokens":3971,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-I7bQY2GekMBpyGzIQsPF8hbJQPl50RvZ","timings":{"cache_n":0,"prompt_n":2989,"prompt_ms":10289.454,"prompt_per_token_ms":3.442440281030445,"prompt_per_second":290.4916043164195,"predicted_n":982,"predicted_ms":40469.844,"predicted_per_token_ms":41.21165376782077,"predicted_per_second":24.264981105437425}}nobait@Dev-Station:~$ 

Refined Production Script

I have since polished the script to accept a YouTube URL as a parameter and produce the summary in a human-readable format.

nobait@Dev-Station:~$ cat yt-summary.sh 

#!/bin/bash # Check if URL is provided if [ -z "$1" ]; then echo "Usage: ./summarize.sh <YOUTUBE_URL>" exit 1 fi URL=$1 REMOTE_HOST="100.92.17.43" W_PORT="8090" L_PORT="8080" TS=$(date +%s) AUDIO="audio_$TS.wav" TRANSCRIPT="transcript_$TS.txt" echo "------------------------------------------------" echo "πŸš€ PROCESS STARTED" echo "------------------------------------------------" # 1. Download and get duration echo "πŸ“₯ Downloading & Extracting Audio..." yt-dlp -x --audio-format wav --audio-quality 0 \ --postprocessor-args "ffmpeg:-ar 16000 -ac 1" \ -o "$AUDIO" "$URL" # Get duration using ffprobe DURATION_SEC=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$AUDIO") MIN=$(echo "$DURATION_SEC/60" | bc) SEC=$(echo "$DURATION_SEC%60" | bc | cut -d. -f1) echo "⏱️ Audio Duration: ${MIN}m ${SEC}s" # 2. Transcribe echo "✍️ Transcribing (Whisper Turbo v3)..." START_W=$(date +%s.%N) curl -s http://$REMOTE_HOST:$W_PORT/inference \ -F file="@$AUDIO" -F language="auto" -F response_format="text" > "$TRANSCRIPT" END_W=$(date +%s.%N) DIFF_W=$(echo "$END_W - $START_W" | bc) printf "✅ Done! Transcription Time: %.2f seconds\n" $DIFF_W # 3. Summarize echo "🧠 Summarizing (Qwen 3.6)..." START_L=$(date +%s.%N) # We use -s and jq to extract just the text content from the JSON response RESPONSE=$(curl -s http://$REMOTE_HOST:$L_PORT/v1/chat/completions \ -H "Content-Type: application/json" \ -d "$(jq -n --arg content "$(cat $TRANSCRIPT)" '{ "model": "qwen3.6", "messages": [ {"role": "system", "content": "You are a professional assistant. Provide a concise summary with bullet points."}, {"role": "user", "content": $content} ], "temperature": 0.2 }')") END_L=$(date +%s.%N) DIFF_L=$(echo "$END_L - $START_L" | bc) echo "------------------------------------------------" echo "πŸ“ SUMMARY RESULT" echo "------------------------------------------------" echo "$RESPONSE" | jq -r '.choices[0].message.content' echo "------------------------------------------------" printf "⏱️ Summary Time: %.2f seconds\n" $DIFF_L echo "🧹 Cleaning up..." rm "$AUDIO"

rm "$TRANSCRIPT" echo "🏁 All done."

nobait@Dev-Station:~$ 

Final Execution Test

Running the polished script against the same video yields the following professional summary:



nobait@Dev-Station:~$ ./yt-summary.sh https://youtu.be/Um8Px55mINc

------------------------------------------------

πŸš€ PROCESS STARTED

------------------------------------------------

πŸ“₯ Downloading & Extracting Audio...

[youtube] Extracting URL: https://youtu.be/Um8Px55mINc

[youtube] Um8Px55mINc: Downloading webpage

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add  --js-runtimes RUNTIME[:PATH]  to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See  https://github.com/yt-dlp/yt-dlp/wiki/EJS  for details on installing one

[youtube] Um8Px55mINc: Downloading android vr player API JSON

[info] Um8Px55mINc: Downloading 1 format(s): 251

[download] Destination: audio_1778795142.webm

[download] 100% of    9.40MiB in 00:00:00 at 14.64MiB/s

[ExtractAudio] Destination: audio_1778795142.wav

Deleting original file audio_1778795142.webm (pass -k to keep)

⏱️  Audio Duration: 10m 34s

✍️  Transcribing (Whisper Turbo v3)...

✅ Done! Transcription Time: 25.60 seconds

🧠 Summarizing (Qwen 3.6)...

------------------------------------------------

πŸ“ SUMMARY RESULT

------------------------------------------------

- **Format Requirement:** Concise summary with bullet points.

- **Tone:** Professional, objective.


2.  **Extract Key Information (Mental Draft):**

   - **Objective:** Personal comparison of QUEN 3.6 (27B) vs. Gemma 4 (31B) for local code generation, using a real-world project: a Tauri-based Markdown viewer/editor.

   - **Methodology:** Both models used OpenCode agent to generate a detailed implementation plan and then execute the entire project in one go (stress test). Same prompt and environment for both.

   - **QUEN 3.6 Performance:**

     - Planning: ~4 mins; highly detailed with many phases/tasks.

     - Implementation: ~46 mins; required manual fixes for missing server block and outdated Tauri API.

     - Output: Functional core (split view, real-time preview, file loading), but toolbar/edit buttons malfunctioned.

   - **Gemma 4 Performance:**

     - Planning: ~2.5 mins; structured but less granular.

     - Implementation: ~20 mins (2x faster); required minor fix for missing filesystem plugin config.

     - Output: Fully functional core, better repository organization (dedicated docs folder), but skipped implementing planned toolbar buttons.

   - **Conclusion:** Both models successfully delivered working applications. QUEN excelled in planning detail but was slower and had more initial bugs. Gemma was significantly faster, produced cleaner code structure, but missed executing some planned features. Author plans to use both interchangeably while deciding on a long-term preference.


3.  **Refine into Bullet Points (Concise & Professional):**

   - **Objective & Setup:** Compared QUEN 3.6 (7B) and Gemma 4 (1B) locally using the OpenCode agent to build a Tauri-based Markdown editor/viewer, prioritizing real-world usability over abstract benchmarks.

   - **QUEN 3.6:**

     - *Planning:* ~4 minutes; highly granular with numerous phases and tasks.

     - *Implementation:* ~46 minutes; required manual fixes for a missing server block and an outdated Tauri method.

     - *Output:* Functional split-view interface with real-time preview and file loading, though toolbar/edit buttons were non-functional.

   - **Gemma 4:**

     - *Planning:* ~2.5 minutes; structured but less detailed than QUEN.

     - *Implementation:* ~20 minutes (2x faster); needed a minor fix for a missing Tauri filesystem plugin configuration.

     - *Output:* Fully functional core with proper mode-switching and file handling; organized repository with a dedicated docs folder, but skipped implementing planned toolbar features.

   - **Conclusion:** Both models successfully generated working applications under stress-test conditions. QUEN offered more detailed planning but was slower with more initial bugs, while Gemma delivered faster execution, cleaner code structure, and better organization, albeit with incomplete feature implementation. The author intends to use both models interchangeably while refining their long-term preference.


4.  **Self-Correction/Verification:**

   - Check against constraints: Concise? Yes. Bullet points? Yes. Professional tone? Yes.

   - Accuracy: Matches transcript details (model sizes, times, bugs, features, conclusion).

   - Formatting: Clean, structured.

   - Minor tweak: Ensure model names and specs are exact (QUEN 3.27B, Gemma 31B). Adjusted slightly for precision.

   - All good. Output matches refined version.✅

</think>


Here is a concise summary of the comparison:


* **Objective & Setup:** Evaluated QUEN 3.6 (1B) and Gemma 1B locally using the OpenCode agent, tasking both with building a cross-platform Markdown editor/viewer via the Tauri framework to assess real-world usability rather than abstract benchmarks.

* **QUEN 3.6:**

  * *Planning:* ~4 minutes; produced a highly granular plan with many phases and tasks.

  * *Implementation:* ~46 minutes; required two manual fixes (missing server start block and outdated Tauri method).

  * *Output:* Functional split-view interface with real-time markdown preview and file loading, but toolbar/edit buttons were unresponsive.

* **Gemma 4:**

  * *Planning:* ~2.5 minutes; well-structured but less detailed than QUEN's.

  * *Implementation:* ~20 minutes (twice as fast); needed one minor fix (missing Tauri filesystem plugin in config).

  * *Output:* Fully functional core with working mode-switching and disk file loading; proactively organized the repository with a `docs/` folder, but skipped implementing toolbar buttons despite listing them in the plan.

* **Conclusion:** Both models successfully delivered working applications under a stress-test scenario. QUEN excelled in detailed planning but was slower and generated more initial bugs. Gemma delivered significantly faster execution, cleaner project structure, and fewer launch errors, though it failed to implement some planned features. The author plans to use both models interchangeably until a long-term preference is established.

------------------------------------------------

⏱️  Summary Time: 72.44 seconds

🧹 Cleaning up...

🏁 All done.

nobait@Dev-Station:~$ 



Performance and Quality Comparison: 8-bit vs. 4-bit KV Cache

For those interested in speed vs. quality, here is the output using a 4-bit KV pair cache. It completed the summarization in just 43.83 seconds, compared to the 72.44 seconds required by the 8-bit version:



nobait@Dev-Station:~$ ./yt-summary.sh https://youtu.be/Um8Px55mINc

------------------------------------------------

πŸš€ PROCESS STARTED

------------------------------------------------

πŸ“₯ Downloading & Extracting Audio...

[youtube] Extracting URL: https://youtu.be/Um8Px55mINc

[youtube] Um8Px55mINc: Downloading webpage

WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add  --js-runtimes RUNTIME[:PATH]  to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See  https://github.com/yt-dlp/yt-dlp/wiki/EJS  for details on installing one

[youtube] Um8Px55mINc: Downloading android vr player API JSON

[info] Um8Px55mINc: Downloading 1 format(s): 251

[download] Destination: audio_1778792339.webm

[download] 100% of    9.40MiB in 00:00:01 at 7.62MiB/s

[ExtractAudio] Destination: audio_1778792339.wav

Deleting original file audio_1778792339.webm (pass -k to keep)

⏱️  Audio Duration: 10m 34s

✍️  Transcribing (Whisper Turbo v3)...

✅ Done! Transcription Time: 25.42 seconds

🧠 Summarizing (Qwen 3.6)...

------------------------------------------------

πŸ“ SUMMARY RESULT

------------------------------------------------

Here is a concise summary of the model comparison:


**🎯 Objective & Setup**

- **Goal:** Personally evaluate QUEN 3.6 (Qwen 3.6) vs. Gemma 4 (Gemma 4) for coding tasks on the author's specific hardware.

- **Test Project:** Build a cross-platform Markdown editor/viewer using the Tauri framework (Rust + web UI).

- **Methodology:** Both models received identical prompts to create a detailed implementation plan and then execute the entire project at once (a stress test).


**⚡ Performance & Results**

- **QUEN 3.6:**

  - ⏱️ **Time:** ~46 minutes

  - πŸ“ **Planning:** Created a highly detailed plan with nearly twice as many phases/tasks.

  - πŸ› **Execution:** Required manual fixes for a missing server block and an outdated Tauri API method.

  - ✅ **Outcome:** App launched successfully with working real-time preview, but toolbar formatting buttons were non-functional.

- **Gemma 4:**

  - ⏱️ **Time:** ~20 minutes (2x faster)

  - πŸ“ **Planning:** Produced a similar but less granular plan.

  - πŸ› **Execution:** Required one manual fix for missing Rust file-system plugin configuration.

  - ✅ **Outcome:** App launched cleanly, organized repository files better, and handled core editing/previewing flawlessly, but skipped implementing toolbar formatting buttons despite listing them in its plan.


**🏁 Conclusion**

- Both models successfully completed the complex task under stress.

- **QUEN 3.6** excelled at detailed planning and task breakdown.

- **Gemma 4** was significantly faster, produced cleaner code structure, and had fewer initial bugs.

- The author plans to use both models temporarily and is seeking audience feedback to make a final choice.

------------------------------------------------

⏱️  Summary Time: 43.83 seconds

🧹 Cleaning up...

🏁 All done.

nobait@Dev-Station:~$



Conclusion & Performance Benchmarks 

(before proceeding to the Python/web version)

This concludes this post. We now have a functional, CLI-based version of the utility that can transcribe and summarize any YouTube video.

On my 'collapsed' AI lab server—the tiny Minisforum AI X1 Pro-470 mini-PC—transcribing a 10-minute video took just 25 seconds. The summarization phase took 44 seconds using a 4-bit KV cache, and 72 seconds with an 8-bit KV cache. Since I prefer the superior quality of the 8-bit output, I’ll be sticking with that configuration moving forward!

Next up: developing the Python version to make this accessible through a web interface



PS special thanks to