Smart Scrape Any URL or Website to Markdown [Expert CPU Mode]

Enter a URL or paste HTML code directly into the text box below.

  • For URLs, the tool attempts to extract the main article content using readability before converting.
  • For Pasted HTML, the tool converts the entire provided HTML (after basic cleaning) without using readability's content extraction. It identifies a title (page title or first H1 fallback) and converts to Markdown. Includes security checks and size limits. Use the copy icon (📋) in the output box to copy the code.
Examples
url html_input

How it works (v1.2):

  1. Input: Accepts URL or direct HTML.
  2. Fetch/Clean: Gets HTML, performs security checks (IP block, size limit), removes basic tags (<script>, <nav>, etc.). Determines if source is URL or Direct Input.
  3. Content Processing:
    • If Source is URL: Attempts readability-lxml extraction (doc.summary()). Falls back to cleaned HTML if extraction fails/is empty.
    • If Source is Direct Input: Skips readability-lxml extraction. Uses the cleaned HTML directly.
  4. Title Logic: Tries Readability title (if URL source). Falls back to first <h1> in cleaned HTML otherwise.
  5. Deduplication: Removes the first <h1> from the processed content if it matches the determined title.
  6. Conversion: Uses markdownify to convert the final processed HTML to Markdown.
  7. Output: Prepends title (if found) and returns Markdown or error message.
  8. Logging: Uses Python's logging.