Smart Scrape Any URL or Website to Markdown [Expert CPU Mode]

Enter a URL or paste HTML code directly into the text box below.

For URLs, the tool attempts to extract the main article content using readability before converting.
For Pasted HTML, the tool converts the entire provided HTML (after basic cleaning) without using readability's content extraction. It identifies a title (page title or first H1 fallback) and converts to Markdown. Includes security checks and size limits. Use the copy icon (📋) in the output box to copy the code.

url

html_input

output

Examples

url	html_input

How it works (v1.2):

Input: Accepts URL or direct HTML.
Fetch/Clean: Gets HTML, performs security checks (IP block, size limit), removes basic tags (<script>, <nav>, etc.). Determines if source is URL or Direct Input.
Content Processing:
- If Source is URL: Attempts readability-lxml extraction (doc.summary()). Falls back to cleaned HTML if extraction fails/is empty.
- If Source is Direct Input: Skips readability-lxml extraction. Uses the cleaned HTML directly.
Title Logic: Tries Readability title (if URL source). Falls back to first <h1> in cleaned HTML otherwise.
Deduplication: Removes the first <h1> from the processed content if it matches the determined title.
Conversion: Uses markdownify to convert the final processed HTML to Markdown.
Output: Prepends title (if found) and returns Markdown or error message.
Logging: Uses Python's logging.