Export posts+images only from one category

pratik · January 24, 2025, 9:43pm

Is it possible to export content (posts + mages) that are tagged by a specific category? I can probably get the posts from MarsEdit but not the associated images.

pratik · January 24, 2025, 10:08pm

I was able to extract the posts via MarsEdit but not sure how to export just the linked images.

jsonbecker · January 24, 2025, 11:27pm

I can’t think of any easy way. You could write a script to scan those markdown files for links with the micro.blog image CDN or uploads/ format and then curl to get them locally I guess. I think the markdown + theme download has all the static images and has replaced the image references with local paths though.

manton · January 25, 2025, 4:27pm

Yeah, not really an easy way. One weird work-around I could think of would be to export all the posts to a WordPress blog, then create a new blog on Micro.blog, then add the WordPress category feed to the Micro.blog Sources page to import just that category and all images into the new blog on Micro.blog. Then export it. That would give you a clean export with exactly what you wanted, but the steps are so convoluted that I doubt anyone has ever tried it.

pratik · January 25, 2025, 5:27pm

Do you mean using the import to Micro.blog after exporting posts from that category? I imagine using the Sources page will import only the 50 most recent posts, right?

jsonbecker · January 25, 2025, 5:38pm

It’ll import as many posts are in your RSS feed from the Wordpress blog. If you’ve only seen 50 latest, that’s from a feed that limits items to 50.

manton · January 25, 2025, 8:03pm

Good point, I hadn’t considered the limit on the number of posts in the feed. If you have a thousand posts in the category, that is not going to work very well. I was trying to brainstorm other ways to slice the category data up, but what I said is probably a dead-end.

jsonbecker · January 25, 2025, 9:03pm

I am away from my computer. But this chatgpt script looks good.

#!/bin/zsh

# Directory containing the plain text files
text_dir="/path/to/text/files"

# Temporary file to store unique URLs
temp_file=$(mktemp)

# Regex pattern to match image URLs
image_url_pattern='https:\/\/[^\s]+?\.(jpg|jpeg|png|gif|heic|webp)'

# Loop through all text files in the directory
for file in "$text_dir"/*.txt; do
  if [[ -f "$file" ]]; then
    # Extract image URLs and append them to the temporary file
    grep -oE "$image_url_pattern" "$file" >> "$temp_file"
  fi
done

# Remove duplicate URLs
sort -u "$temp_file" -o "$temp_file"

# Create a directory to store downloaded images
download_dir="$text_dir/downloaded_images"
mkdir -p "$download_dir"

# Download each image using wget
while IFS= read -r url; do
  echo "Downloading $url..."
  wget -P "$download_dir" "$url"
done < "$temp_file"

# Clean up the temporary file
rm "$temp_file"

echo "All images downloaded to $download_dir."

This was my prompt:

Write a zsh script that searches a directory of plain text files and looks for any text that references an image at a URL (please assume all URLs start with https and have a file extension of jpg, jpeg, png, gif, heic, or webp), deduplicates the list, and then uses wget to download all of those images.

Probably have to change the file for the right type (md versus txt) but pretty clean to follow and adjust.

pratik · January 25, 2025, 10:04pm

Err…this is something I have to run at my end or Manton?

jsonbecker · January 25, 2025, 11:20pm

That’s what you can do with the posts you grabbed from MarsEdit.

pratik · January 26, 2025, 12:25am

Sorry for the stupid Q but where do I run this script? Terminal? Apple Script?

jsonbecker · January 26, 2025, 3:30am

That’s a zsh script. Save it as a file. Edit the file path references etc in the file to be correct. Save the file, then in terminal type chmod +x yourfilenamehere to make it executable. Then type ./yourfilenamehere to run it.

pratik · January 27, 2025, 3:01pm

save it as any file? I mean, .txt, .rtf?