Hi, I’m Jeremy, also known as

Dreadfullyposh

How I built a Medium content archive website with Laravel

Follow along as I build a web application to archive my Medium content on my own domain.

This story originally appeared on Medium. View on Medium »

Photo by Safar Safarov on UnsplashPhoto by Safar Safarov on Unsplash

As a member of the Medium Partner Program, I publish almost all of my content on Medium. But recently there's been a lot of talk in my Twitter network about the Medium paywall and a lot of writers moving away from Medium because of it. Personally I still very much like Medium and don't mind the paywall, so I plan to continue publishing here for the foreseeable future.

But I was struck with a little sense of fear about what would happen to my content if Medium were to go away, or simply change their model to one that I do not want to participate in.

So in the interest of maintaining continuity for my content, and also to provide an alternate means for readers to consume it, off of the Medium platform, I decided that I wanted to create a self-hosted archive of my Medium stories.

The end result can be seen here.

For years my personal website (dreadfullyposh.com) has been just a single static page. I haven't bothered to use a CMS to drive it, since I rarely make updates to the page, let alone add any new content. (Another case of the cobbler's children having no shoes.)

So I was starting more or less with a blank slate. Clearly I needed a tool in place to manage my content if I was going to be adding my posts to my site. Normally, I'd reach for Craft CMS as my first tool of choice, but that was definitely overkill for what I needed, and would require maintenance to keep the CMS up to date with the developers' rapid pace of releases. Plus, I really wanted to avoid using a MySQL database, since it would introduce yet another potential point of failure. I liked having a simple static site that didn't require updates, backups, or additional dependencies. I considered Statamic as another CMS option, but it still felt like overkill. Both of these tools provide a great toolset for editing content, but I was going to be archiving existing content, not creating new content. And neither option would have made for a seamless way to pull my content down from Medium without a significant amount of work building a plugin or some sort of customization.

So I went rogue and decided to build something custom with the following requirements:

  1. The process of loading content from Medium should be automated. It doesn't need to happen with no user intervention, but there should be a minimum number of steps involved.
  2. Content should be stored in static files, so they can be committed into a Git repository for safe keeping.
  3. The site should not be dependent on a MySQL database, or any other external services besides a web server running PHP.

Starting the Project

My existing site was a single HTML page, styled with Tailwind and some custom Sass styles. So obviously I needed to figure out what tools I should use to make my site dynamic. I originally reached for SlimPHP, but I found it was a bit too barebones for what I wanted to do. So I started a base project with Laravel. Having never built a project from scratch with Laravel, I decided it would be a good opportunity to gain some experience with the framework.

So, first things first, I moved my existing homepage into a simple view with a controller and then proceeded on toward the bigger task of determining how to get content out of Medium and how to store it within my site.

Getting the Content Out

Of course with Medium being a membership-driven site, where they encourage users to subscribe, engage with content on the site (or app), and stay on the site for as long as possible, they don't expose an API to allow you to retrieve your content. They do have a bulk export option, which prepares an archive of all of your content when you request it, but that wasn't going to be very helpful for continually exporting content as it's added over time. So I went to Google and came across a NPM package which exports Medium posts to Markdown files called mediumexporter.

I installed it, and it worked pretty well for the most part. But as I worked with it a bit more it only got me about 85% toward what I wanted to do. The metadata that was exported was designed for use with Jekyll and didn't include some of the information I wanted to include in my content archive.

I started a fork of the package to see if I could customize it to my own liking, but the deeper I got into it, the more issues I found. Specifically the handling of embedded content, such as tweets from Twitter was lacking. And in some cases, entire stories would fail to export, likely due to some errors in the logic or some issues with underlying package dependencies.

Since Node isn't a language super familiar with, and I was just trying to get this project moving rather than teaching myself an entirely new toolset, I didn't really want to completely rework mediumexporter. So I went looking for other avenues, preferably in PHP, so I could work with something more familiar and potentially consolidate everything into a single application.

However from my tinkering with mediumexporter, I did learn a bit more about Medium — how to access a JSON-encoded version of my Medium content, by simply appending ?format=json to my story URLs and a bit about its content structure.

The JSON is structured into a series of sections and paragraphs, each with an array of markups, which indicate the placement of links, text formatting, and embedded media throughout the content. It was understandable, but complex enough that I wanted to make sure there wasn't an existing solution before I started writing my own processor for it.

I came across a Composer package, medium-json-parser,* which looked like it could be useful.

In the end, I found *medium-json-parser *was also a bit too opinionated and non-customizable to be particularly useful for what I wanted to do. But I again learned a lot more about how to process Medium's JSON.

Neither of these options quite matched everything I was trying to do, but between mediumexporter and medium-json-parser, I a lot better idea about how Medium's data is structured and had some good examples for how to process the JSON.

So as a last resort, I took the lessons from the two existing packages and started to build out my own Medium JSON processor in side my Laravel app. I wrapped it in an Artisan command, so I could easily run a single command in my terminal to grab the content and images from a Medium story, using only its URL.

As I worked with the processor, I added some of the functionality that had been missing from one or both of the existing packages to handle various paragraph and embed types. Medium uses a combination of Embedly and its own custom metadata and integrations to handle embedded media. So in my processor I had to put some logic to look for a variety of types of embeds that I use in my content, such as Tweets, Gists, rich links, etc., and determine how to format them in the outputted Markdown.

Storing Content

As one of my requirements, I wanted to store the content in flat files so that I could version control it and also keep my hosting setup simple.

Since the original NPM package I had found exported to Markdown, I stayed on that path as I wrote my own processor and continued to output the content into Markdown files. While the Markdown syntax is limited, it covers most of the available types of markup in a Medium article, and for those that it doesn't (such as embedded media), I could simply insert blocks of HTML into the files to handle them.

In addition to the story content itself, I needed to also download and store the images that were used in my stories to ensure that my site could run fully independently from Medium should my content ever be removed.

Using Laravel's built in filesystem abstraction was a joy, since I didn't have to think about paths or reading files at all in my application code. I setup a new "disk" to hold my stories in a private location, and a directory inside the preconfigured public disk to hold my downloaded images. This resolved one of my key frustrations with the Jekyll-style directory structure of mediumexporter, the intermingling of content and image directories.

After setting these up, I pointed my processor, which I called StoryLoader, to these filesystem locations, and that portion of my project was complete.

Indexing Content

In my first pass at reading from the Markdown files, I simply used Spatie's YAML frontmatter parser to read the contents of my files and output them into views.

It worked fine, and I even used Laravel's collections to make it easy to sort, and search. The interface was nice but as I started building out my listing page for my stories, I felt uneasy about reading every single story file and parsing it before displaying the output on the page, especially when I only needed the title, subtitle and URL.

I toyed with some form of caching and eventually came across Sushi, which seemed like a really close match to what I was trying to do — essentially creating an Eloquent model for data that's not in a database.

It sounded great, and I hooked it up to my file-reading code, but eventually realized that its built-in caching mechanism didn't get used when you were using a custom method to populate it. So it really wasn't any better than what I had built on my own. I could have built a caching mechanism in, but it still felt like a hack.

But I was inspired by Sushi's use of a SQLite database. While my initial requirement was to not use a database at all, my intent was to avoid relying on a MySQL server and the associated need for backups, etc. So using a simple self-contained SQLite database in my Laravel application seemed like fair game. (And I was making up the rules anyway.)

So much like my StoryLoader, I now built an EntryLoader, again connected to an Artisan command, to ingest the Markdown files and store them in a database table. The idea is that when a new code push deploys to my server with new content, the command is run and the content from new Markdown files are added to the database, and existing records are updated if their files have been updated.

Displaying the Content

With all the crunching and moving of data complete it was time to move on to the next step. Displaying the content on my site is about as simple as it gets. Since the content now exists in a SQLite database with an Eloquent model connected to it, it's really simple to output the list of stories, or a single story, in a view and access all of its columns easily.

Of course not being a designer that hardest part was layout out some half-decent looking templates to format the content for viewing. 😑

The only thing I needed to add was some meta tags, Open Graph tags, and JSON-D, which I found an excellent library to help with, SEOTools.

I also added an XML sitemap, though the sitemap packages I came across were all too heavy-handed for what I needed, so I setup a controller and view to quickly output a simple sitemap.

Deployment

The last thing I had to do was setup deployment. I wanted to keep things as simple and automated as possible. Bitbucket is my go-to choice for this because of their generous free tier and their excellent Pipelines for CI/CD, which is powered by Docker.

I setup my pipeline to build my frontend assets, copy them up to the server, pull down the latest code (including my Markdown content) from my Bitbucket repo, and then run a series of commands to run any new migrations on the database and re-sync any new or updated Markdown files to the database.

That's pretty much all there is to it.

Archiving New Content

With deployment in place, all three of my requirements have been met, and the process of archiving a new Medium story to my site is super simple.

Run my Artisan command to create the Markdown file and archive the images.

php artisan load-story <url-to-medium-post>

Commit the files created by the Artisan command to the repo and push to Bitbucket

… That's it. Within a few minutes Bitbucket has deployed the new code and re-synced the database and my content is now live on the site.

Get in Touch With Jeremy

The majority of my time is spent working for Happy Cog, however I do take on occasional consulting and speaking gigs.

Get in touch with me:

Recruiters, please do not contact me. I’m not looking for employment at this time.

©2020 Dreadfullyposh, Ltd. All rights reserved.