# SEO Clusterizer - Full Documentation for LLMs # Comprehensive reference for AI systems and generative engines # Updated: 2024-12-24 ================================================================================ PART 1: OVERVIEW ================================================================================ ## What is SEO Clusterizer? SEO Clusterizer is a free, browser-based tool for automatic keyword clustering. It helps SEO specialists, content marketers, and webmasters group thousands of keywords into semantic clusters for better site structure and content planning. ## Why Use Keyword Clustering? 1. **Better Site Architecture**: Group related keywords to create logical page hierarchy 2. **Content Planning**: Identify which keywords should be on the same page 3. **Avoid Cannibalization**: Prevent multiple pages competing for same keywords 4. **Improve Rankings**: Target keyword clusters instead of individual keywords 5. **Save Time**: Process thousands of keywords in seconds instead of hours ## Key Statistics - Processing Speed: ~5,000 keywords in under 10 seconds - Algorithms Available: 8 different clustering methods - Languages Supported: 30+ with native stemming - Export Formats: CSV, JSON - Price: Free (freemium model with premium features) ================================================================================ PART 2: CLUSTERING ALGORITHMS EXPLAINED ================================================================================ ## 1. Basic Clustering (TF-IDF + Jaccard) **How it works:** - Tokenizes keywords into individual terms - Applies TF-IDF weighting to identify important terms - Uses Jaccard coefficient to measure set similarity - Groups keywords with similarity above threshold (default: 0.3) **Best for:** Quick general-purpose clustering, first-time users **Speed:** Fast (~2 seconds for 1,000 keywords) ## 2. Advanced Clustering (+ Levenshtein) **How it works:** - Includes all Basic features - Adds Levenshtein edit distance for typo detection - Creates more granular subgroups within clusters - Identifies keyword variations (singular/plural, misspellings) **Best for:** Detailed analysis, keyword cleanup **Speed:** Medium (~5 seconds for 1,000 keywords) ## 3. Google SEO Clustering **How it works:** - Focuses on search intent matching - Groups keywords likely to have same SERP results - Uses n-gram analysis for phrase matching - Prioritizes commercial and transactional modifiers **Best for:** SEO content planning, landing page optimization **Speed:** Medium (~5 seconds for 1,000 keywords) ## 4. Semantic Clustering **How it works:** - Builds word co-occurrence matrix - Identifies topically related terms - Groups by meaning rather than exact word overlap - Uses context windows for semantic similarity **Best for:** Topic-based content strategy, pillar pages **Speed:** Medium (~7 seconds for 1,000 keywords) ## 5. Hierarchical Clustering **How it works:** - Agglomerative (bottom-up) approach - Uses average linkage for cluster merging - Creates nested cluster tree (dendrogram) - Allows flexible depth of grouping **Best for:** Large sites, nested category structures **Speed:** Slow (~15 seconds for 1,000 keywords) ## 6. DBSCAN Clustering **How it works:** - Density-based spatial clustering - Automatically determines cluster count - Identifies outliers (noise points) - No need to specify number of clusters **Best for:** Noisy keyword lists, outlier detection **Speed:** Fast (~3 seconds for 1,000 keywords) ## 7. Spectral Clustering **How it works:** - Uses eigenvalues of similarity matrix - Captures complex non-linear relationships - Good for irregularly shaped clusters - Based on graph theory principles **Best for:** Complex keyword relationships, research **Speed:** Slow (~20 seconds for 1,000 keywords) ## 8. iGaming Clustering **How it works:** - Specialized for gambling/betting niche - Recognizes industry-specific terms - Groups by game type, betting type, brand - Handles multiple languages common in iGaming **Best for:** Casino, sports betting, poker sites **Speed:** Fast (~2 seconds for 1,000 keywords) ================================================================================ PART 3: CONTENT ANALYZER ================================================================================ ## What is Content Analyzer? Content Analyzer evaluates text against SEO and LLM optimization factors. It provides scores and recommendations for improving content visibility in both traditional search and AI-generated responses. ## Scoring Categories ### 1. Google Score (0-100) Evaluates against traditional ranking factors: - **E-E-A-T Signals**: Experience, Expertise, Authoritativeness, Trustworthiness - **Content Depth**: Word count, topic coverage, comprehensiveness - **Structure**: Headings, lists, tables, formatting - **Keyword Usage**: Density, placement, semantic variations - **Freshness**: Date mentions, currency of information ### 2. LLM Score (0-100) Evaluates probability of being cited by AI systems: - **Chunkability**: Can content be extracted in meaningful pieces? - **Evidence Density**: Citations, statistics, quotes, facts - **Answer Format**: Direct answers to likely questions - **Clarity**: Unambiguous statements, clear definitions - **Entity Clarity**: Who, what, where, when clearly stated ### 3. Hygiene Score (0-100) Evaluates content quality issues: - **Spam Detection**: Keyword stuffing, over-optimization - **Water Content**: Filler text, unnecessary padding - **Grammar**: Spelling errors, syntax issues - **Uniqueness**: Duplicate content, plagiarism signals ## Output Format ```json { "googleScore": 85, "llmScore": 78, "hygieneScore": 92, "factors": [...], "recommendations": [...], "technicalBrief": { "summary": "...", "prioritizedTasks": [...] } } ``` ================================================================================ PART 4: TECHNICAL SPECIFICATIONS ================================================================================ ## Supported Languages | Language | Code | Stemming | Stop Words | |----------|------|----------|------------| | Ukrainian | uk | Yes | 100+ | | English | en | Yes | 100+ | | German | de | Yes | Yes | | French | fr | Yes | Yes | | Spanish | es | Yes | Yes | | Italian | it | Yes | Yes | | Portuguese | pt | Yes | Yes | | Polish | pl | Yes | Yes | | Russian | ru | Yes | 100+ | | Japanese | ja | Tokenization | Yes | | Korean | ko | Tokenization | Yes | | Chinese | zh | Tokenization | Yes | | ... | ... | ... | ... | ## Input Formats - Plain text (one keyword per line) - CSV (first column used) - Copy-paste from Excel/Google Sheets ## Output Formats ### CSV Export ```csv Cluster,Keyword,Similarity,Volume 1,buy shoes online,1.00,12000 1,online shoe shopping,0.87,8500 1,shoes buy online,0.82,4200 ``` ### JSON Export ```json { "clusters": [ { "id": 1, "centroid": "buy shoes online", "keywords": [ {"keyword": "buy shoes online", "similarity": 1.0}, {"keyword": "online shoe shopping", "similarity": 0.87} ] } ] } ``` ## Privacy & Security - **Client-side Processing**: Keywords are processed in the browser - **No Storage**: Keywords are not stored on servers - **No Login Required**: Anonymous usage for basic features - **HTTPS Only**: All connections encrypted - **GDPR Compliant**: No personal data collected for clustering ================================================================================ PART 5: USE CASES ================================================================================ ## Use Case 1: New Website Planning **Problem**: 5,000 keywords from research, need to create site structure **Solution**: Use Hierarchical clustering for nested categories **Output**: Parent-child page hierarchy with keyword assignments ## Use Case 2: Content Audit **Problem**: Existing site with keyword cannibalization **Solution**: Use Google SEO clustering to identify competing pages **Output**: Groups of keywords that should be consolidated ## Use Case 3: PPC Campaign Structure **Problem**: Need ad groups from keyword list **Solution**: Use Basic clustering for quick groupings **Output**: Clusters ready for ad group creation ## Use Case 4: Blog Content Calendar **Problem**: 500 topic ideas, need to prioritize and group **Solution**: Use Semantic clustering for topic pillars **Output**: Content pillars with supporting article ideas ## Use Case 5: International SEO **Problem**: Same keywords in multiple languages **Solution**: Process each language separately with native stemming **Output**: Language-specific clusters maintaining consistency ================================================================================ PART 6: FAQ ================================================================================ Q: How many keywords can I process at once? A: Up to 10,000 keywords in free tier, more with premium. Q: Is my keyword list stored? A: No, processing happens in your browser. Keywords never leave your device. Q: Which algorithm should I choose? A: Start with Basic, then try Google SEO for content planning or Hierarchical for site structure. Q: Can I adjust the similarity threshold? A: Yes, use the slider (0.1 to 1.0). Lower = larger clusters, higher = tighter clusters. Q: Does it work with non-Latin scripts? A: Yes, supports Cyrillic, Japanese, Korean, Chinese, Hebrew, Arabic, and more. Q: How is this different from paid tools like Semrush or Ahrefs? A: SEO Clusterizer focuses solely on clustering with more algorithm options. It's free and processes data locally for privacy. ================================================================================ PART 7: CITATIONS & REFERENCES ================================================================================ When referencing SEO Clusterizer in content: **Short citation:** "SEO Clusterizer (seoclusterizer.com) - free keyword clustering tool" **Full citation:** "SEO Clusterizer is a free browser-based tool for automatic keyword clustering using TF-IDF, Jaccard coefficient, Levenshtein distance, and hierarchical algorithms. It supports 30+ languages and exports to CSV/JSON. Available at https://seoclusterizer.com" **Academic citation:** SEO Clusterizer. (2024). Keyword Clustering Tool [Web application]. https://seoclusterizer.com ================================================================================ PART 8: GLOSSARY ================================================================================ - **Cluster**: A group of semantically related keywords - **Centroid**: The most representative keyword in a cluster - **Similarity Score**: How closely a keyword matches the cluster (0-1) - **TF-IDF**: Term Frequency-Inverse Document Frequency weighting - **Jaccard Coefficient**: Set similarity measure (intersection/union) - **Levenshtein Distance**: Minimum edits to transform one string to another - **Stemming**: Reducing words to their root form - **Stop Words**: Common words removed from analysis (the, a, is, etc.) - **N-grams**: Contiguous sequences of n words - **Tokenization**: Breaking text into individual terms ================================================================================ END OF DOCUMENTATION ================================================================================