Visualizing Ideas with GPT-3 and Diagram-as-Code


Cover Photo by Christina @ wocintechchat.com on Unsplash

Summary

Point-and-click diagramming tools (Visio, draw.io, Lucidchart) have provided value over the years, but none are particularly git-friendly. Code-based diagramming tools are not as ubiquitous as their UI-based counterparts, but several offerings show promise. Paired with AI tools like Github Copilot and ChatGPT, stream-of-consciousness diagramming (i.e. say what you’re thinking, see it on screen) is on the horizon. This article evaluates the following tools for this purpose:

The proper tool depends on use case, and this article is not intended to be a comprehensive evaluation. Rather, this article is a documentation of my own research and experience combining AI and DaC tools. Both Github Copilot and ChatGPT were used to test each framework. My summary of the pros and cons is as follows:

Mermaid
+ Best for simple diagrams, natively supported by markdown
- Limited support for ordering, complex diagrams

Diagrams (Python Library)
+ Python-based, flexible, well-supported
- Multiple dependencies, SVG format issues

PlantUML
+ Used by AWS, well-supported, very flexible
- Domain-specific language, not supported natively in markdown

Introduction

This is my first post on this site, and I’m still trying to figure out if I should be writing in first, second, or third person. Very likely that this page will bounce around between the three. That aside, the purpose of this article is to find a code-based diagramming solution that meets the following criteria:

Low-medium learning curve
Open-source
Easily integrated into a git-based workflow
Supports complex diagrams and nesting
Allows for custom styling and icons
Is well-supported and has a large community
Supported by AI tools like Github Copilot and ChatGPT

Based on my (limited) research, I’ve narrowed the field down to three options. This is not a comprehensive list, but I’m hoping that this article helps others with a similar goal.

Mermaid

Mermaid is a javascript library that is natively supported by most markdown implementations, including Github and Azure DevOps. This site’s framework (Hugo) also supports it, however it takes some additional configuration that I’ve not yet bothered with. For now, I render everything through mermaid.live, export as SVG, and upload as resources in Hugo. It’s a great option for simple diagrams, and certainly has the lowest barrier to entry. I had great success using The Github Copilot plugin for VSCode to generate reference architecture diagrams:

<!-- Copilot Prompt  -->

mermaid diagram of a basic kubernetes cluster

<!-- Copilot Output -->

graph LR
  subgraph "Kubernetes Cluster"
    subgraph "Kubernetes Master"
      master["Kubernetes Master"]
      subgraph "Kubernetes API Server"
        api["Kubernetes API Server"]
      end
      subgraph "Kubernetes Controller Manager"
        controller["Kubernetes Controller Manager"]
      end
      subgraph "Kubernetes Scheduler"
        scheduler["Kubernetes Scheduler"]
      end
    end
    subgraph "Kubernetes Worker Nodes"
      worker["Kubernetes Worker Nodes"]
      subgraph "Kubernetes Kubelet"
        kubelet["Kubernetes Kubelet"]
      end
      subgraph "Kubernetes Container Runtime"
        runtime["Kubernetes Container Runtime"]
      end
      subgraph "Kubernetes Kube Proxy"
        proxy["Kubernetes Kube Proxy"]
      end
    end
  end
  master --> api
  master --> controller
  master --> scheduler
  worker --> kubelet
  worker --> runtime
  worker --> proxy

The rendered result:

Mermaid diagram of a basic kubernetes cluster

This is as much a testament to the power of Copilot as it is to the power of Mermaid. Five seconds of typing, and I had a presentation-worthy model of a relatively complex topic. That being said, Mermaid is not without its limitations. In another example, I attempted to build a model of the relationships between email addresses, email servers, dns servers, and cloud providers. I have an [ir]rational fear of locking myself out of an AWS account by setting the root email to one whose DNS is controlled by the account itself. Lose access to the root account email and AWS account at the same time? If you don’t have another backdoor in, you could permanently be locked out of controlling your own domain. An unlikely scenario, but Murphy’s law keeps me up at night. I digress; this is a topic for another post, and is arguably how I ended up writing this one first. I’ve still not come close to finishing the diagram for it, but the bit I’ve done so far has highlighted some of the limitations of Mermaid.

At the time of writing, Github does not support something called subgraph linking. This was feature was added to Mermaid at some point around version 8.5.1, but it appears Github is not running a version greater than this.

<!-- Linked Subgraph Example -->

graph TD
  subgraph Gmail
    email["Personal Email Address<br/>personal@gmail.com"]
  end

  subgraph Exchange
    email2["Work Email Address<br/>work@custom.com"]
  end

  subgraph Domain Management
    subgraph Namecheap
      domain[custom.com domain]
    end

    subgraph AWS[AWS Route 53]
      hostedzone -->|Contains| records
      records -->|NS| NS[ns-1.awsdns-02.co.uk.<br/>ns-3.awsdns-04.com.<br/>ns-5.awsdns-06.org.<br/>ns-7.awsdns-08.net.]
      records -->|MX| mxrecord[O365]
      records -->|A| arecord
      records -->|TXT| txtrecord
      records -->|SOA| soarecord
    end
  end

  subgraph O365
    o365 --> user["user@custom.com"]
    o365 --> aws["aws@custom.com"]
    aws --> accountname["accountname+aws@custom.com"]
  end

  user -->|alias| aws
  Gmail --> Namecheap
  Namecheap -->|NS Servers| AWS
  Exchange --> Namecheap
  AWS ------> O365
  O365 --> Exchange

Locally rendered result:

Mermaid diagram of email address relationships

Github render error:

Github error

Note: I recently found a workaround for this issue, although I am skeptical as the official docs say it is still unsupported. Changing the type from graph to flowchart allows the subgraph linking to work. It could be just a fluke with my example. I’ve not tested it with anything more complex.

As annoying as this limitation is within Github, it’s not an outright dealbreaker. I can think of different ways to represent the data, or render locally and store the result as an SVG in the repo. It’s not ideal, but it works.

The next limitation, however, I have not found a workaround for. In the case of my previous diagram, I wanted to place the Gmail and Exchange nodes at the top of the diagram. Mermaid supports 4 different orientations for flow charts:

TB or TD (top to bottom)
BT (bottom to top)
LR (left to right)
RL (right to left)

These are sufficient for simply changing orientation as a whole, but do not provide control beyond that, and I could not find a simple way to rearrange individual nodes. A long-standing issue lives on github, and highlights many other instances in which this is a showstopper. In some cases, just adding a single node to the diagram can cause the entire layout to change. The lack of control in this regard led me to look at other options.

Diagrams (Python Library)

The next tool I looked at came from my bias towards python. Admittedly, I rarely write more than a few basic scripts per year, but the syntax is familiar to me and the community is massive. Diagrams is a library that pipes graph declarations to an application called Graphviz, which then renders the diagrams to the desired format. This has the advantage of leveraging all of python’s constructs, and (mostly) negates the need to learn a DSL. Despite these benefits, I spent the least amount of time testing this library. The setup was a bit more complex than I would have liked. The Graphviz setup was easy with Chocolatey, but the python library required a few more steps. I hit a snag: error: Microsoft Visual C++ 14.0 is required. The articles referencing this error seemed more relevant to pip or python installs in general though, so I won’t pin this on the library itself. I installed a distribution of Visual C++, rebooted, and everything worked after that. If I were to use this daily, I’d containerize everything.

With the setup out of the way, I ran through the same test as with Mermaid.

<!-- Copilot Prompt  -->

python diagrams diagram of a basic kubernetes cluster

# Copilot Output

from diagrams import Cluster, Diagram
from diagrams.k8s.compute import Pod
from diagrams.k8s.network import Service
from diagrams.k8s.storage import PV, PVC

with Diagram("Kubernetes", show=False):
    with Cluster("Namespace"):
        with Cluster("Pods"):
            pod1 = Pod("pod1")
            pod2 = Pod("pod2")

        with Cluster("Services"):
            svc1 = Service("svc1")
            svc2 = Service("svc2")

        with Cluster("Persistent Volumes"):
            pv1 = PV("pv1")
            pv2 = PV("pv2")

        with Cluster("Persistent Volume Claims"):
            pvc1 = PVC("pvc1")
            pvc2 = PVC("pvc2")

        pod1 >> svc1 >> pv1 >> pvc1
        pod2 >> svc2 >> pv2 >> pvc2

The rendered result:

Diagrams library output

Again, hats-off to Copilot for generating this from such little information. The addition of Kubernetes icons is a great touch, and the result is very visually appealing. The default output format is PNG, so an additional parameter changes it to SVG:

with Diagram("Kubernetes", show=False, outformat="svg"):

Unfortunately, the SVG displays with broken references to the K8s icons.

Example Broken SVG:

Broken SVG

<!-- Reference to a local png file within the SVG XML -->

<g id="node5" class="node">
<title>d66c6169d1144024b733c69424e7f319</title>
<image xlink:href="C:\Python311\Lib\site-packages\resources/k8s/compute\pod.png" width="101px" height="101px" preserveAspectRatio="xMinYMin meet" x="24" y="-168.5"/>
<text text-anchor="middle" x="74.5" y="-53.1" font-family="Sans-Serif" font-size="13.00" fill="#2d3436">pod2</text>
</g>

At first this seemed like a limitation of the SVG format, however there are supported element attributes for embedding rasterized images in SVGs:

<svg xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink">

    ...
    <image
        width="100" height="100"
        xlink:href="data:image/png;base64,IMAGE_DATA"
        />
    ...
</svg>

After perusing the github issues, I found that others have experienced the same problem, and there is not an out-of-the-box solution – only postprocessing scripts that can cover the gap in fuctionality. Bummer. PNG it is then for this library 🙃

Moving beyond this limitation, I tried again to model my domain / cloud problem with the library. I employed ChatGPT for this, as I found that Copilot struggled with some of the nuance I was trying to communicate. Additionally, I find the conversational nature of ChatGPT to be better for making changes to larger blocks of existing code. After a few prompts, the output looked promising; however it failed to execute as GPT was making assumptions about properties and methods of the Diagram class that were not present. This was actually pretty surprising, as I’ve been able to write other python scripts for scraping sites using the requests and bs4 libraries without issue. I suppose there may not be as many examples of the diagrams module for GPT to go off of. After lobbing back and forth for an hour or so, I concluded that GPT wasn’t going to be as helpful as I had hoped, and getting meaningful results would take some time investment on the docs. This is more of an RTFM error on my part, but I decided to move to the next tool to see if I could get better results.

PlantUML

PlantUML was the first utility that I found for this purpose, however I was less than enthused to give it a shot after seeing the wall of ads that fill the homepage:

PlantUML Homepage

For a moment I thought I’d hit a crappy mirror site; but sure enough, that is the official page. I understand the sentiment of the ads, but there has to be a better way to raise awareness. I’m reblogging them, so I suppose they’re getting further reach this way ¯\_(ツ)_/¯

While annoying, the ads don’t detract from the power of the tool. Fortunately, there is a powerful online compiler on the site, similar to the Mermaid Live Editor. It supports exporting files as PNGs, SVGs, and even ASCII art (though the ASCII renderings require that no external dependencies or images are linked). I was surprised to see that the compiler even supports importing resources from github. On top of that, it encodes these resources as base64 within the SVG files, which was one of my main complaints with the python Diagrams library.

Speaking of external resources, in my search for examples, I found the official PlantUML repo for AWS. Support from one of the cloud giants is huge, as it means that it will (hopefully) stay relevant for the forseeable future. Based off of their examples, it is apparent that many of the AWS blog posts make use of the library for visualization. Here is one example that comes to mind. I can’t say for 100% certainty that they are using PlantUML on all posts, but the majority I have seen show a similar style to the examples in the repo. There are also other well-maintained libraries, though none seem to have the same executive backing that the AWS one does.

Starting off with a Copilot prompt as I did with the other examples, I was able to get a basic diagram up and running with PlantUML’s DSL:

<!-- Copilot Prompt -->

plantuml diagram of a basic kubernetes cluster

<!-- Copilot Output -->

@startuml
cloud "Kubernetes Cluster" {
  package "Node 1" {
    component [pod1]
    component [pod2]
  }
  package "Node 2" {
    component [pod3]
    component [pod4]
  }
  [pod1] -- [pod2]
  [pod3] -- [pod4]
}
@enduml

The result, rendered from the online compiler as SVG:

PlantUML k8s diagram

Simple, but again, it effectively communicates the idea. Now, my results from using Copilot and ChatGPT to build larger and more intricate diagrams have been mixed. I’ve found that attempting to mix assets in with the core concepts results in output that is too verbose for the GPT processors to handle. This applies to all 3 of the diagramming tools in some capacity. It seems that the best approach is to break down each idea into primitives, and then let the AI draw the layout. From there, replacing node icons and adding labels is relatively straightforward.

Domain problem again:

<!-- Copilot Prompt -->

plantuml diagram, two nodes Email1 and Email2 at the top.
both Email1 and Email 2 point down to node Namecheap.
Namecheap points down to a node called AWS.
The line between namecheap and AWS says "Route53".
AWS has four nodes below it, NS, SOA, TXT, and MX.
AWS points down to NS, SOA, TXT, and MX.
MX points down to a node called O365.
O365 points all the way back up to Email2

<!-- Copilot Output -->

@startuml
node Email1
node Email2
node Namecheap
node AWS
node NS
node SOA
node TXT
node MX
node O365

Email1 -- Namecheap
Email2 -- Namecheap
Namecheap -- AWS
AWS -- NS
AWS -- SOA
AWS -- TXT
AWS -- MX
MX -- O365
<!-- I had to add the following line -->
O365 -up- Email2
@enduml

The rendered result:

PlantUML domain plain diagram

A bit of finagling to add some icons:

@startuml
!define AWSPuml https://raw.githubusercontent.com/awslabs/aws-icons-for-plantuml/v14.0/dist
!include AWSPuml/AWSSimplified.puml
!include AWSPuml/NetworkingContentDelivery/Route53.puml
!include AWSPuml/GroupIcons/Cloud.puml

!define SPRITESURL https://raw.githubusercontent.com/plantuml-stdlib/gilbarbara-plantuml-sprites/master/sprites
!includeurl SPRITESURL/namecheap.puml
!includeurl SPRITESURL/google-gmail.puml
!includeurl SPRITESURL/microsoft-windows.puml
!includeurl SPRITESURL/microsoft-azure.puml

node "<$google-gmail>\nEmail1" as Email1
node "<$microsoft-windows>\nEmail2" as Email2
node "\t<$namecheap>\n    Namecheap" as Namecheap
Cloud(AWS, "AWS", "AWS")
Route53(NS, "NS", "NS")
Route53(SOA, "SOA", "SOA")
Route53(TXT, "TXT", "TXT")
Route53(MX, "MX", "MX")
node "<$microsoft-azure>\nO365" as O365

Email1 -- Namecheap
Email2 -- Namecheap
Namecheap -- AWS
AWS -- NS
AWS -- SOA
AWS -- TXT
AWS -- MX
MX -- O365
O365 -up- Email2
@enduml

The final result:

PlantUML domain diagram with icons

Okay, don’t get me wrong, it’s still a bit ugly. I scraped the icons from a few disparate repos and half of them are monochrome sprites (and Email2 isn’t even a mail logo), but the diagram most accurately represents the idea that I had in my head. The SVG contains no external references, the nodes are ordered in the way that I envisioned, and the code is relatively easy to read. With some tweaking to my own site, it would be possible to render the diagrams client-side, which would save bandwidth and eliminate the need for pre-rendering of the diagrams.

Conclusion

Each of the tools evaluated has its own strengths and weaknesses. It could be argued that Mermaid is the best for most use cases, as it is supported on almost all VCS platforms. The primary limitation relates to ordering/ranking. Diagrams certainly has an advantage for die-hard python fans– I just had a difficult time with the output formats. PlantUML, despite the bloated documentation site, has proven to be the most relevant for my use case. Pairing each of these with Copilot and ChatGPT has been a fun experiment, and I am very curious to see how the technology evolves in the coming years. Maybe one day codebases will document themselves :)