Smart Scrape Any URL or Website to Markdown [Expert CPU Mode]
Enter a URL or paste HTML code directly into the text box below.
- For URLs, the tool attempts to extract the main article content using
readabilitybefore converting. - For Pasted HTML, the tool converts the entire provided HTML (after basic cleaning) without using
readability's content extraction. It identifies a title (page title or first H1 fallback) and converts to Markdown. Includes security checks and size limits. Use the copy icon (📋) in the output box to copy the code.
Examples
| url | html_input |
|---|
How it works (v1.2):
- Input: Accepts URL or direct HTML.
- Fetch/Clean: Gets HTML, performs security checks (IP block, size limit), removes basic tags (
<script>,<nav>, etc.). Determines if source is URL or Direct Input. - Content Processing:
- If Source is URL: Attempts
readability-lxmlextraction (doc.summary()). Falls back to cleaned HTML if extraction fails/is empty. - If Source is Direct Input: Skips
readability-lxmlextraction. Uses the cleaned HTML directly.
- If Source is URL: Attempts
- Title Logic: Tries Readability title (if URL source). Falls back to first
<h1>in cleaned HTML otherwise. - Deduplication: Removes the first
<h1>from the processed content if it matches the determined title. - Conversion: Uses
markdownifyto convert the final processed HTML to Markdown. - Output: Prepends title (if found) and returns Markdown or error message.
- Logging: Uses Python's
logging.