Semantic HTML
Sid Su |Introduction
Most of the HTML on the modern web is unreadable garbage with more
<script>
tags than actual HTML. Furthermore, the content of
the page—the actual text, images, videos, games, etc.—are often
inserted by a JavaScript library into a single line, making it hard to parse
what is actually happening on the page. If you don’t believe me, go to
almost any major website, right click and select View Page Source
.
What you will likely see is a jumbled mess of automatically generated HTML
inserted between, <script>
tags and
<link>
tags.
This is not just a code aesthetic problem. Almost every major website does
not pass W3 validation (Googlebot). Website weight—a measure of how much
data is transferred to load a given web page has gone up every year since we
started tracking it (HTTP Archive: Page Weight). Finally, a lack of semantic
tags—using <div>
for everything—has made the web
less accessible than ever before (accessiBe).
People will often argue yes, modern HTML is unreadable, but it is a
necessary byproduct of making web development ergonomic and feasible
. I
argue that this itself is an outdated assessment, especially since the majority
of websites are still not very dynamic, nor do they need to be. They consist of
static text, images and links. Rather than complex frameworks, we ought to be
making more websites with Semantic HTML, which is ergonomic and fun to write
directly.
What Is Semantic HTML?
Semantic HTML is just using tags such as <article>
,
<section>
, <nav>
,
<header>
and <footer>
instead of
<div>
—and using tags such as <dfn>
,
<address>
, <strong>
,
<em>
, <code>
and <kbd>
instead of <span>
. These tags have the advantage of
conveying meaning in themselves throughout the document. Whereas before meaning
would be communicated through classes and ids.1
Advantages of Semantic HTML
- More Readable Structure
- An article should always be the start of a self-contained portion of the webpage, a code segment is always some monospaced-text portion, a section is always a cohesive part of the content. When this idea is placed throughout the whole document, it’s easy for someone familiar with Semantic HTML to know what’s happening anywhere in the document.
- Performant
- The webpage will be about as performant as possible, because we are working with just plain HTML.1
- Accessible
- Semantic tags have Aria tags2 built-in, so it is just a matter of using the tags correctly, and the website is accessible by screenreaders.
- Potential SEO Gains
- This is changing all the time, but it does seem like better formed pages seem to get better treatment by search engines.
Caveats
While there significant advantages, it is important to also note the caveats:
- MUCH More Typing
- There is a LOT more typing when writing raw HTML because you need to worry
about which tags to use, and a lot of tags that won’t even be seen by the
final user. For instance, every date in this document has been using the
<time>
tag, but you don’t see that. However, it does help both robots and people assisted by robots understand that the string of characters is a date, and the computer will present it as a date as opposed to the number 2025. - Steeper Learning Curve
- Learning the tags does take some time, and if you do pick up Semantic HTML, you will inevitably go back to old documents to make them up to standards with all of the semantic tags. I have done so many times. However, once you pick up the tags, your speed at writing semantic HTML goes up, because you aren’t thinking about which tags to use as much.
- Bad Tooling
- Currently all tooling for writing straight HTML is not very good. Simple things like a spellchecker, or a linter to make sure that the HTML is using all semantic HTML standards just do not exist right now. The closest thing is the W3 validator, but that is not checking for a lack of semantic tags, rather it is just checking that the tags that exist in the document are used correctly.
I did not find these downsides to be fatal. Once you get the tags down, and have a few HTML documents you can copy the boilerplate from, productivity with Semantic HTML will go up. For many, any learning curve is too much, we expect tools to be intuitive, with a skill floor deep underground. I don’t think that mindset is necessarily right, but that is a philosophical discussion for another post.
Semantic HTML Style
Here are style choices that work for me, and some justification behind each one.
- 80 Columns
- I want to write HTML directly, so it must look good in the editor. 80 columns is standard everywhere else, so why not for HTML?
- Only Ascii
- This one is a bit controversial, because UTF-8 is widely supported in browsers. However, mg, one of the editors I use still does not support UTF-8, so I still only use ascii characters while editing.
- Prefer Named HTML Entities
- Follows directly from the previous style recommendation. For instance,
I could use
’
to represent ’, it’s more clear in the code to type’
. - No Indentation
- Most people online swear by HTML indentation, and it does make sense if the layout is what is emphasized in the HTML document. However, if we are focusing on content, then no indentation makes more sense. Markdown is also focused on content and does not typically indent, so neither should HTML.
- Prefer HTML Figures
- Prefer creating tables, and text diagrams over immediately reaching for
<img>
. This is just something in general I have been thinking about related to the overuse of images in papers. Even then, it saves a lot of bandwidth, and may potentially look nicer to just put a table or formatted text instead of a full fat image. - Prefer Web Components For Dynamic Elements
- In order to not clutter the HTML document, prefer using Web Components for complex things.
Conclusion
Whether or not you choose to use semantic HTML, I hope that you at least keep some of the principles in mind when creating your next website. I understand that sometimes websites can be very complex, and it may not be viable to make anything but a mess of JavaScript libraries. However, you should ask yourself if what you are doing really needs to be done with JavaScript and 15 different frameworks, or can HTML or CSS accomplish the same thing for a much smaller payload and a much more clear codebase. Can it be done reasonably well as a static website with dynamic components? Or does it really need to be fully dynamic?
I am genuinely surprised that there’s not that much talk about using Semantic HTML, and that there is next to no tooling around it. It is such a useful paradigm to build websites around, yet it has been mostly ignored by web developers as a whole.
As a side note, I was inspired by The Motherfucking Website, The Better Motherfucking Website and The Best Motherfucking Website.1
Works Cited
accessiBe. How Many Websites Are Actually Accessible And
ADA-Compliant?
. accessiBe, 30 June 2025.
Googlebot. Why do major websites not pass W3C validation properly?
Forum post. Stack Overflow, 9 Apr. 2012, https://stackoverflow.com/q/9870780.
HTTP Archive: Page Weight. https://httparchive.org/reports/page-weight. Accessed 21 Sept. 2025.