Regex Snippet: Useful Regex Substitutions for Reformatting Copied Text Snippets
Replace single spaces after end-of-sentence periods with double spaces:
(rx (seq (group (or (seq (not (any whitespace ".")) (not upper-case))
(seq (any whitespace ".") (not word))
(seq (not (any whitespace ".")) word))
(? ")")
(any ".?!")
(* (any "'’’‚‚❜❟"
"\"””„„❞❠"
"_*+/=-"
")]}")))
" "
(group (not (any " " lower-case)))))
\(\(?:[^.[:space:]][^[:upper:]]\|[.[:space:]][^[:word:]]\|[^.[:space:]][[:word:]]\))?[!.?][]"')-+/=_}’‚”„❜❞-❠-]*\) \([^ [:lower:]]\)
and replace with
\1 \2
Replace double-hyphens with en-dashes:
\([^-]\)--\([^-]\) → \1–\2
Replace triple-hyphens with em-dashes:
\([^-]\)---\([^-]\) → \1—\2
Remove superscript annotations:
\^{\[/?.*?/?\]} →
Remove trailing double-backslash and spaces:
*\\\\$ →
Fix empty body text hyperlinks:
\]\[\]\] → ]]
Remove trailing whitespace:
*$ →
Remove null links that are commonly inserted by Pandoc in place of HTML self-links:
\[\[https?://\(?:[\w-]\.\)*?[\w-]+/?.*?/null\]\(?:\[\]\)?\] →
Replace YouTube tracking links with plain links:
https://www\.youtube\.com/redirect\?.*?&q=\(.*?\)&v=[a-zA-Z0-9\-\._~%\$!&'()\*\+,;=:@]\{8,24\} → \1
Remove YouTube icon URIs in hyperlink bodies:
\[\[https://www\.gstatic\.com/youtube/img/.*?\.\(?:png\|jpeg\|jpg\|svg\)]] →
Remove wikimedia embedded image of rendered LaTeX:
\[\[https?://wikimedia.org/api/rest_v1/media/math/render/.*?\]\] →