Industrial AI Notes

Turning Real Objects into CAD Models

Yuki Furuta — Mon, 15 Jun 2026 14:19:22 GMT

When I design something in CAD, even a simple bracket, it is always easier if I already have a 3D model of the machine, device, or equipment around it.

With a 3D model, I can quickly check how the new part looks, whether it interferes with anything, and how it fits into the overall system. It also makes documentation much easier. A screenshot from CAD explains the idea much better than a long paragraph.

But in the real world, we often do not have clean CAD data for existing machines or equipment.

Of course, if the manufacturer provides the CAD model, that is ideal. But if not, we need to create the model ourselves. And honestly, building a 3D CAD model from scratch is painful. Unless it is absolutely necessary, it is the kind of task I tend to postpone.

Recently, though, 3D reconstruction tools have become much more practical. We can now reconstruct objects from videos, multiple photos, or even a single image, convert them into meshes, and bring them into CAD.

So I wanted to test a practical workflow:

Can I take a real object, reconstruct it as a 3D mesh, and use it inside CAD for layout checks and interference checks?

This time, I used a small network camera as the test object.

It is a common industrial-style network camera: a white metal housing, a camera unit on one side, and a rear box where cables such as LAN are connected. I chose it because it is small and easy to test quickly.

Photogrammetry with Meshroom

There are many possible methods, but I started with a classic and very well-known tool: Meshroom.

Meshroom is an open-source photogrammetry tool based on AliceVision. In simple terms, photogrammetry tries to reconstruct a 3D model from multiple 2D images.

The rough idea is:

Detect feature points in each image
Match the same points across multiple images
Estimate the camera positions
Create a sparse point cloud
Create a dense point cloud
Generate a mesh
Apply texture

So instead of modeling the object manually, we let the software estimate the 3D shape from many photos taken from different angles.

For this test, I captured around 30 seconds of video with a smartphone. Then I extracted frames using FFmpeg:

ffmpeg -i input.mp4 \ 
-vf "fps=2" \ 
frames/frame_%05d.jpg

This gave me around 60 images.

In a proper photogrammetry workflow, it would be better to remove blurry images before processing. Even better, instead of recording video, I should take high-quality still images one by one from many angles.

But for this first test, I wanted to see how far I could get with a quick video capture.

Then I opened Meshroom, selected the Photogrammetry pipeline, and loaded the images.

After loading the images, they appeared in the Image Gallery. The Graph Editor also automatically generated the photogrammetry pipeline.

One small but important note: if nothing appears in the Graph Editor, the pipeline is not ready. In that case, the reconstruction will not run correctly.

After that, I clicked the Start button at the top, and Meshroom started the full 3D reconstruction process automatically.

When the process finished, the result appeared in the 3D Viewer on the right side. I could see both the reconstructed object and the estimated camera positions.

Then I exported the result as an OBJ file and opened it in Blender to check the shape.

The result was usable as a rough mesh, but honestly, the camera shape was not very good. There were unwanted parts such as the desk, and the main object surface was rough and noisy.

Of course, I could delete the unnecessary parts in Blender. But as a model of the camera itself, the quality was not quite enough.

Photogrammetry with RealityScan

Next, I tried RealityScan from Epic Games.

Many people may know the smartphone app version, but there is also a desktop/Linux workflow, so I tested that as well.

The overall workflow is similar to Meshroom: load multiple images, align them, and generate a 3D model.

However, the operation flow is slightly different.
In Meshroom, pressing Start runs the whole pipeline automatically, from feature extraction to meshing.
In RealityScan, I first ran Align Images. This step performs feature extraction, image matching, camera estimation, and point cloud generation.

After confirming that the alignment looked reasonable, I ran Calculate Model to generate the 3D mesh.

The result was also not as clean as I hoped.

Again, this does not mean RealityScan is weak. RealityScan is a very powerful photogrammetry tool. The issue was probably my input data.

I think the main problems were:

I extracted still images from a handheld video, so some frames were blurry or motion-blurred.
The target object was mostly white, so it did not have many easy-to-detect visual features.
I did not capture enough angles carefully. I just moved around the object quickly with a phone.
The desk and background were also included, which made the reconstruction harder.

If I improved these points, I think both Meshroom and RealityScan could create much better models.

Still, for this kind of small, white, smooth industrial object, photogrammetry can easily create a bumpy mesh. Even if the overall shape is reconstructed, the surface may not be clean enough for CAD-style visualization.

So I decided to try another approach: AI-based single-image 3D generation.

Hunyuan3D-2.1

The first AI-based tool I tested was Tencent Hunyuan3D-2.1.

Hunyuan3D-2.1 can generate a 3D asset from an image. It is probably more commonly used for games, digital content, and 3D assets rather than mechanical CAD.

I will skip the full setup here because the GitHub repository already explains it, but the model is quite large and requires a reasonably powerful machine.

My environment was:
OS: Ubuntu 22.04
RAM: 64 GB
GPU: NVIDIA RTX 4090

The easiest way for me was to launch the official Gradio app:

python3 gradio_app.py --model_path tencent/Hunyuan3D-2.1 --subfolder hunyuan3d-dit-v2-1 --texgen_model_path tencent/Hunyuan3D-2.1 --low_vram_mode

Then I uploaded a single image of the camera.

The result was generated quickly on my local machine.

Compared with the photogrammetry results, the surface was much smoother. The overall shape was also close to what I expected.

This was impressive because it used only one image.

The tool can export several model formats. For bringing the model into CAD, STL is usually a convenient choice, although OBJ can also be useful depending on the workflow.

Of course, this is not a dimensionally accurate engineering model. It is not a replacement for proper measurement or reverse engineering.

But for visual checks, rough layout studies, and early-stage design discussion, it is already quite useful.

TripoSG

Next, I tested TripoSG, another image-to-3D model.

I used the same single input image as before.

After setting up the environment following the official GitHub repository, I ran the inference script and generated a 3D mesh.

Before importing it into CAD, I opened the output in Blender.

This result was excellent.

The overall shape looked better than the previous results in this test. The surface was smooth, the camera body was recognizable, and unwanted parts such as the desk were automatically removed.

For this specific object and this specific input image, TripoSG gave me the best-looking model.

So I decided to use the TripoSG output for the CAD test.

In Blender, I exported the generated OBJ file as an STL file.

Importing the mesh into CAD

Finally, I opened the STL file in Autodesk Fusion.

The mesh loaded without any major issue.

There are some important limitations.
First, the scale needs to be corrected. AI-generated models and photogrammetry models are often not created at the exact real-world size unless we add scale references or manually adjust them.
Second, this is still a mesh. It is not the same as a clean parametric CAD model. I cannot simply grab a surface and extend it like I would with a normal CAD body.

But for my purpose, it was useful.

I could place the camera model near other equipment, check the approximate appearance, and think about the installation layout.

For interference checks, visual confirmation, and documentation, this workflow feels very promising.

Practical Takeaways

This test was not about creating a perfect engineering CAD model.

The real question was more practical:

How far can we go if we just want to bring a real-world object into CAD quickly enough for design support?

For that purpose, this workflow is already useful.

The most important point is to separate two different goals.

If you need accurate dimensions, clean surfaces, and editable CAD features, neither quick photogrammetry nor single-image AI generation is enough by itself. You still need proper measurement, reverse engineering, or manual CAD modeling.

But if you need a model for visual checks, layout studies, interference checks, installation planning, or documentation, these tools can save a lot of time.

From this test, I would use the tools differently depending on the purpose.

Photogrammetry tools like Meshroom and RealityScan are powerful when you can prepare good input images. They are especially useful when you can take many sharp photos from many angles under good lighting. However, the result depends heavily on image quality. Smooth, white, reflective, or low-texture objects can be difficult. Blurry frames from video can also make the result much worse.

AI-based tools like Hunyuan3D-2.1 and TripoSG are different. They may not reproduce the exact dimensions, but they can create a clean and recognizable mesh very quickly from just one image. For early-stage CAD visualization, this can be more useful than a noisy photogrammetry mesh.

So my current practical rule is:

Use photogrammetry when geometry accuracy from multiple real images matters. Use AI-based image-to-3D when speed and visual quality matter more.

For this specific test, TripoSG gave the most practical result. The model was clean, the object shape was easy to recognize, and unnecessary background parts were mostly removed automatically. After exporting it through Blender as an STL file, I could import it into Fusion and use it as a reference object.

This does not replace real CAD data.

But it creates a useful middle ground between “I have no model at all” and “I need to manually model everything from scratch.”

For industrial automation, robotics, equipment layout, and field engineering work, that middle ground is valuable. Many times, we do not need a perfect model at the beginning. We just need something close enough to support discussion, check space, explain an idea, or avoid obvious design mistakes.

In future tests, I want to compare more workflows, including better photogrammetry capture, Gaussian Splatting, multi-image AI reconstruction, and hybrid pipelines.

The bigger direction is clear:

The gap between real-world objects and CAD workflows is getting smaller.

And that could make practical engineering design much faster.

Same Prompt, Very Different UI, Comparing Codex With and Without `ui-ux-pro-max-skill`

Yuki Furuta — Sat, 11 Apr 2026 05:29:06 GMT

When you ask AI to build a UI, the difference does not come only from the model itself. The output also changes a lot depending on what prior knowledge you give it and what design criteria you make it operate with.

For this experiment, I gave Codex the exact same prompt and asked it to build two versions of a browser app called Virtual Factory, a factory 3DGS dashboard. One version was generated by standard Codex. The other was generated by Codex with ui-ux-pro-max-skill enabled.

According to its README, ui-ux-pro-max-skill can be installed for Codex CLI with uipro init --ai codex, and it is designed to auto-trigger on UI/UX-related requests. It is not just a styling pack. It includes a Design System Generator, stack-specific guidance, persistent design rules via --persist, and support for multiple frontend stacks including React and Next.js.

Here is the prompt I used:

Create a browser-based web app called "Virtual Factory".

It’s a SaaS-style dashboard for factory 3DGS:

upload scans

view them

add notes

link documents

Use a modern frontend stack (prefer React / Next.js). Mock complex parts if needed.

Important:

it should run locally in a development server

I should be able to open it in a browser and see a working UI

prioritize frontend demo quality over production completeness

Deliver a runnable project with setup instructions.

Note This comparison is based on the attached source code and screenshots. In virtual-factory-2, app/page.js references FactoryDashboard, but that component itself was not included in the attached source bundle I reviewed. So the code-level comparison below is limited to what could actually be verified from the files provided.

The Main Takeaway

The most interesting difference was not whether one version looked flashier.

Standard Codex was very good at producing a strong, immediately demoable UI with functional flow and visual punch in a short distance. The version generated with ui-ux-pro-max-skill, on the other hand, looked more like it was trying to build a product with information architecture, not just assemble attractive components.

In other words, the real difference showed up less in decoration and more in design thinking.

`virtual-factory-1`: A Strong Demo You Can Show Right Away

Figure 1. Full-screen capture of virtual-factory-1. The hero area, metrics, viewer, scan list, notes, and documents all connect within a single screen.

virtual-factory-1 makes a strong first impression. It leans hard into the kind of UI that feels great as a SaaS demo: dark tones, glassy panels, glowing accents, a hero section, and metric cards.

But it is not just visual polish. The upload flow, scan switching, search, note creation, document linking, viewer mode changes, and layer toggles are all wired together in one screen. Even the complex 3DGS part is handled smartly: instead of trying to solve everything for real, it uses a canvas-based mock viewer. For a prompt that explicitly said “mock complex parts if needed,” that is a very effective response.

The source structure is easy to read too. The center of gravity is src/App.jsx, where state, handlers, the viewer, and note/document interactions are mostly gathered into a single file. Architecturally that is fairly monolithic, but it also explains why the result feels so complete so quickly: it is optimized to ship a working demo fast.

If the goal is a sales demo, an internal proof of concept, or something you can open in a browser and show immediately, virtual-factory-1 is genuinely strong. Its value is obvious at first glance.

`virtual-factory-2`: Closer to a Product With Information Architecture

Figure 2. Full-screen capture of virtual-factory-2. For the clearest comparison, use a screen that shows the left sidebar, center viewer, and right-side rail together.

virtual-factory-2 points in a noticeably different direction, even from the files that were visible. Its entry structure uses the Next.js App Router, with app/layout.js and app/page.js. In the CSS, you can already see vocabulary that suggests a more structured screen model: sidebar, hero-grid, workspace-grid, viewer-stage, viewer-hud, rail-card, and rail-item.

The visual tone is different too. Instead of a dark, “showy SaaS” feel, this version leans toward a softer gray operational-console aesthetic—something closer to a factory, blueprint, or monitoring tool. The background grid, HUD-like layers, and three-column structure with a sidebar and right rail make it feel calmer and more like a system people would use every day.

That is the important point. The difference is not simply whether it is more or less flashy. virtual-factory-2 appears to decide the placement of navigation, viewing, monitoring, and supporting information first. It feels less like “make one great-looking screen” and more like “design how this product should be used.”

That lines up closely with the philosophy of ui-ux-pro-max-skill. Its SKILL.md explicitly frames the skill around things like dashboard design, navigation structure, information hierarchy, brand expression, and UX quality control. In other words, it is trying to get the AI to think beyond “place nice-looking components” and toward “organize the product as a system.”

Side by Side, They Optimize for Different Things

Lens	`virtual-factory-1`	`virtual-factory-2`
First impression	Strong hero section and dark SaaS energy	Calm, factory-like operational UI
Center of gravity	Make one screen feel impressive and complete	Establish role separation across the interface
Implementation feel	High completion through a single-file core	Layout vocabulary and product structure come first
Best fit	Fast PoC / sales demo	UI exploration with future expansion in mind

What this comparison reveals is not which one “wins.” It reveals what the AI is optimizing for.

Standard Codex optimizes for short-distance output: connect the required elements quickly, package them into a convincing screen, and make something that is ready to show.

The version with ui-ux-pro-max-skill seems to optimize differently. It tries to identify the product type, impose layout order, decide where information belongs, and then move toward implementation. The README reinforces that interpretation: the skill is built around a Design System Generator and encourages persistent design rules via --persist, with a structure based on shared design guidance and page-level overrides. That is a very different mindset from simply generating prettier UI code.

What Exactly Makes `ui-ux-pro-max-skill` Effective?

What stood out to me in this comparison was not just that the skill can make CSS look better.

Its real value is that it changes the AI’s default questions from:

“How do I make this look impressive?”
“How do I style this?”

into questions like:

“How is this supposed to be used?”
“What information hierarchy should this product have?”
“Where should each function live to support everyday workflows?”

The README describes a workflow where the skill automatically generates a design system for UI/UX tasks, recommends styles, colors, and typography based on product type, and then checks for UI/UX anti-patterns at the end. That maps very well to the feeling I got from virtual-factory-2: it looks like design structure was considered before surface polish.

Final Thoughts

The clearest lesson from this experiment is that even with the exact same prompt, the quality and character of the output can change significantly depending on the prior knowledge and design framework you give the AI.

Standard Codex can absolutely generate a strong UI demo on its own. virtual-factory-1 is a good example of that. It is fast, visually compelling, and immediately usable as a frontend demo.

But with ui-ux-pro-max-skill, the AI seems to think over a longer time horizon. Instead of simply decorating a screen, it starts trying to organize a product. The gap I saw here was less about visual taste and more about the point of view behind the design.

That is the part I find most valuable. It does not just improve the UI output. It upgrades the way the AI thinks about UI in the first place. And that, to me, is where ui-ux-pro-max-skill feels genuinely impressive.

oh-my-claudecode is a Game Changer: Experiencing Local AI Swarm Orchestration

Yuki Furuta — Sat, 04 Apr 2026 14:50:23 GMT

While the official Claude Code CLI has been making waves recently, I stumbled upon a tool that pushes its potential to the absolute limit: oh-my-claudecode (OMC).

More than just a coding assistant, OMC operates on the concept of local swarm orchestration for AI agents. It’s been featured in various articles and repos, but after spinning it up locally, I can confidently say this is a paradigm shift in the developer experience.

Here is my hands-on review and why I think it’s worth adding to your stack.

Why is oh-my-claudecode so powerful?

If the standard Claude Code is like having a brilliant junior developer sitting next to you, OMC is like hiring an entire elite engineering team.

Instead of relying on a single AI to handle everything sequentially, OMC leverages multiple specialized agents working in parallel. What’s even more fascinating is its multi-model support: you aren't locked into Claude. You can integrate Gemini or Codex as Workers. This allows for highly optimized, multi-model team compositions—for instance, assigning frontend UI generation specifically to a Gemini worker because of its distinct strengths.

Before diving into the code, here is a quick matrix to help you choose the right OMC mode based on your task scale and preferred approach:

oh-my-claudecode Mode Selection Matrix

| Approach \ Task Scale | 🟢 Small
(Q&A, Minor Fixes) | 🟡 Medium
(Few Files, Features, Refactors) | 🔴 Large
(Multi-file, Complex Architecture) | | --- | --- | --- | --- | | Hands-off Autonomous
(Set it and forget it) | Native Claude Code | Autopilot
(End-to-end, minimal ceremony) | - | | Guaranteed Completion
(No silent partial stops) | - | Ralph
(Persistent verify/fix loops) | - | | Burst Parallelism
(Maximum speed) | - | Ultrawork
(Burst parallel execution) | - | | Phased & Robust
(Plan & review focused) | - | Pipeline
(Strict sequential ordering) | Team (★ Recommended)
(Plan → PRD → Exec → Verify) | | Multi-Model Collaboration
(Codex / Gemini) | - | - | ccg (Claude synthesizes AI inputs)
omc team (Standalone CLI workers) |

Taking `team 3:executor` for a Spin

To test the waters, I built a prototype app using OMC’s built-in team 3:executor command. The verdict? It is absurdly fast.

It’s not just about the raw speed of code generation; the velocity of the entire development lifecycle is on another level.

1. Seamless Collaboration and Parallel Execution

When you hit enter, it doesn't just linearly spit out code. Multiple agents spin up to handle high-level planning, actual coding, and peer-reviewing in parallel. Because the agents actively verify and review each other’s work, the output quality is exceptionally high right out of the gate. You barely need to touch the keyboard.

2. The Orchestrator’s "Check-ins"

You might worry that a swarm of AIs will go rogue and wreck your codebase. OMC handles this beautifully. An "Orchestrator" acts as the tech lead. At the end of every major phase, it pauses and prompts you: "Here is our progress so far. Do we have permission to proceed to the next phase?" You essentially become the engineering manager, reviewing the report and giving the "LGTM" to proceed. It’s the perfect balance of massive automation and human-in-the-loop control.

3. The `tmux` Spectacle

As an engineer, the coolest part is arguably the visual feedback. OMC integrates natively with tmux. When executed, your terminal automatically splits into multiple panes.

Multiple AI agents working concurrently in separate panes, while the orchestrator summarizes progress.

Watching different AI agents stream logs simultaneously in their own panes while collaborating to build a system is, frankly, spectacular. It feels like a scene straight out of a hacker movie.

⚖️ OMC vs. Anthropic Official Agent Teams: Which should you use?

The elephant in the room: "Anthropic just released official Agent Teams. Why bother with a third-party wrapper?"

It boils down to Official Stability vs. OMC's Extreme Flexibility and Speed.

Feature	🛠️ oh-my-claudecode (OMC)	🏢 Anthropic Official Agent Teams
Core Concept	Maximum flexibility & speed	Predictability & stability
Agent Pool	19+ agents (Custom additions supported)	Limited, pre-defined setups
Model Routing	Smart, automatic routing	Manual user configuration
Skill Learning	Automatically learns project quirks	None (Requires repeated context)
Support/Stability	OSS (Fast updates, potential breaking changes)	Official support, highly stable

3 Reasons OMC is Hard to Give Up

If the table isn't convincing enough, here are three specific pain points OMC completely solves:

Escaping the Single-Agent Bottleneck (Parallelism) Official tools often force sequential execution. OMC’s Team Mode and Ultrawork execute tasks concurrently. If you are doing a massive multi-file refactor, the speed difference is staggering.
Saving Your API Budget (Smart Routing) Running Opus for every minor file read will burn through your tokens in hours. OMC intelligently routes tasks: Haiku for quick searches, Sonnet for heavy coding, and Opus for complex architectural decisions. It saves money automatically.
The "Don't Repeat Yourself" Memory (Skill Learning) OMC learns the specific patterns, rules, and context of your project and remembers them across sessions. You no longer have to paste the same architectural guidelines into the prompt every single day.

The Verdict: If you cannot tolerate a single bug or breaking change in your tooling, stick to the Official Agent Teams. But if you want to push the boundaries of development speed, slash your API costs, and experience the bleeding edge of AI orchestration, OMC is the clear winner.

💻 GUI Alternative: Using Cursor

While OMC truly shines in the terminal (especially for the tmux parallel execution views), not everyone loves living in the CLI.

If you prefer a GUI, you can achieve a similar setup within Cursor. By installing the Claude Code extension and adding OMC as a plugin, you can tap into this swarm intelligence directly from your favorite AI code editor.

Final Thoughts

oh-my-claudecode bridges the gap between simple AI autocomplete and a fully autonomous AI engineering team.

If you want to ship applications at lightning speed...
If you want to see AIs collaborate in real-time...
If you want to optimize your token usage...

Industrial AI Notes

Turning Real Objects into CAD Models

Photogrammetry with Meshroom

Photogrammetry with RealityScan

Hunyuan3D-2.1

TripoSG

Importing the mesh into CAD

Practical Takeaways

Same Prompt, Very Different UI, Comparing Codex With and Without `ui-ux-pro-max-skill`

The Main Takeaway

virtual-factory-1: A Strong Demo You Can Show Right Away

virtual-factory-2: Closer to a Product With Information Architecture

Side by Side, They Optimize for Different Things

What Exactly Makes ui-ux-pro-max-skill Effective?

Final Thoughts

oh-my-claudecode is a Game Changer: Experiencing Local AI Swarm Orchestration

Why is oh-my-claudecode so powerful?

oh-my-claudecode Mode Selection Matrix

Taking team 3:executor for a Spin

1. Seamless Collaboration and Parallel Execution

2. The Orchestrator’s "Check-ins"

3. The tmux Spectacle

⚖️ OMC vs. Anthropic Official Agent Teams: Which should you use?

3 Reasons OMC is Hard to Give Up

💻 GUI Alternative: Using Cursor

Final Thoughts

`virtual-factory-1`: A Strong Demo You Can Show Right Away

`virtual-factory-2`: Closer to a Product With Information Architecture

What Exactly Makes `ui-ux-pro-max-skill` Effective?

Taking `team 3:executor` for a Spin

3. The `tmux` Spectacle