Last updated

Search Engine Indexing of Internal Docs

Question

If someone searches Google, can they find the internal docs?

Answer: YES, if the GitHub repo is public

How Search Engines Index GitHub

✅ GitHub Public Repos ARE Indexed

  • Google, Bing, and other search engines crawl public GitHub repositories
  • All files in public repos can appear in search results
  • Internal docs (AUDIT-.md, POSITIONING-.md, etc.) would be searchable
  • Example search that would find them:
    • site:github.com Papr-ai AUDIT-SUMMARY
    • site:github.com memory-dev-docs positioning strategy
    • "Papr Memory" internal documentation

❌ GitHub Private Repos are NOT Indexed

  • Private repos require authentication
  • Search engines cannot access them
  • Internal docs remain private

❌ Redocly Docs Site (platform.papr.ai) - Internal Docs NOT Indexed

  • Only pages in sidebars.yaml are built and deployed
  • Internal docs are not in sidebar → not deployed → not indexed
  • Even if someone searches site:platform.papr.ai AUDIT-SUMMARY, they won't find it

Current Exposure Risk

Platform 1: Redocly Docs Site (platform.papr.ai)

Risk: ✅ NONE - Internal docs are not deployed

Why safe:

  • Redocly only builds files in sidebars.yaml
  • Internal docs not in sidebar
  • Not deployed to platform.papr.ai
  • Cannot be found via search engines

Platform 2: GitHub Repository

Risk: ⚠️ DEPENDS on repo visibility

If repo is PUBLIC:

  • ❌ Internal docs ARE searchable via Google
  • ❌ Anyone can find them with right search terms
  • ❌ Content is indexed and cached by search engines

If repo is PRIVATE:

  • ✅ Internal docs are NOT searchable
  • ✅ Only team members can access
  • ✅ Search engines cannot index

Example Search Queries That Would Find Internal Docs

If the repo is public, these searches would expose internal docs:

1. "Papr Memory" "AUDIT-SUMMARY"
2. site:github.com/Papr-ai positioning strategy
3. "memory-dev-docs" "Reddit consensus"
4. "Papr" "DIY vs Papr" internal
5. site:github.com Papr-ai POSITIONING-UPDATES

What Information Could Be Exposed

If repo is public and indexed, someone could find:

Low Sensitivity (Currently in your internal docs)

  • ✅ Documentation improvement plans
  • ✅ Positioning strategy and rationale
  • ✅ Decision-making process
  • ✅ Task tracking and planning notes

Risk: Low - Shows thoughtful process, no competitive harm

Medium Sensitivity (If added to internal docs)

  • ⚠️ Competitive analysis details
  • ⚠️ Pricing strategy discussions
  • ⚠️ Unannounced features or roadmap
  • ⚠️ Customer feedback with names

Risk: Medium - Could inform competitors

High Sensitivity (Should NEVER be in git)

  • ❌ API keys or credentials
  • ❌ Customer data or PII
  • ❌ Security vulnerabilities
  • ❌ Proprietary algorithms

Risk: High - Security/compliance violation

How to Check Current Exposure

1. Check if Repo is Public

# Try to access without being logged in
curl -s https://github.com/Papr-ai/memory-dev-docs | grep -q "This repository" && echo "Public" || echo "Private or doesn't exist"

Or visit in incognito browser: https://github.com/Papr-ai/memory-dev-docs

2. Check Google Indexing

Search Google for:

site:github.com/Papr-ai/memory-dev-docs AUDIT-SUMMARY

If results appear → Already indexed
If no results → Not indexed yet (but could be soon if public)

Search on GitHub:

https://github.com/search?q=org:Papr-ai+AUDIT-SUMMARY&type=code

If results appear → Publicly searchable

Solutions to Prevent Search Engine Indexing

Option 1: Make Repo Private (Best Solution)

Pros:

  • ✅ Completely prevents search engine indexing
  • ✅ Keeps all docs in version control
  • ✅ Team can still collaborate via GitHub
  • ✅ No code changes needed

Cons:

  • ⚠️ Loses open-source visibility
  • ⚠️ Community can't contribute easily

How to do it:

  1. Go to repo Settings
  2. Scroll to "Danger Zone"
  3. Click "Change visibility"
  4. Select "Make private"

Option 2: Add Internal Docs to .gitignore

Pros:

  • ✅ Removes from future commits
  • ✅ Keeps repo public
  • ✅ No search engine exposure

Cons:

  • ⚠️ Loses version control for internal docs
  • ⚠️ Already-indexed content remains in Google cache
  • ⚠️ Need to remove from git history

How to do it:

# Add to .gitignore
cat >> .gitignore << 'EOF'

# Internal planning docs
AUDIT-*.md
*-SUMMARY.md
*-UPDATE.md
POSITIONING-*.md
REDDIT-*.md
NEW-DOCS-*.md
DOCS-ORGANIZATION.md
QUICK-WINS-*.md
VIDEO-AUDIO-*.md
ENTERPRISE-FEEDBACK-*.md
CUSTOMER-FACING-*.md
INTERNAL-DOCS-*.md
SETUP-COMPLETE.md
internal/
EOF

# Remove from git (keeps local copies)
git rm --cached AUDIT-*.md *-SUMMARY.md *-UPDATE.md POSITIONING-*.md REDDIT-*.md NEW-DOCS-*.md DOCS-ORGANIZATION.md QUICK-WINS-*.md VIDEO-AUDIO-*.md ENTERPRISE-FEEDBACK-*.md CUSTOMER-FACING-*.md INTERNAL-DOCS-*.md SETUP-COMPLETE.md 2>/dev/null

# Commit and push
git commit -m "Remove internal docs from version control"
git push

Note: This doesn't remove from git history. Already-indexed content may remain in search results.

Option 3: Remove from Git History (Nuclear Option)

Pros:

  • ✅ Completely removes from repo
  • ✅ Eventually removed from search results
  • ✅ Clean slate

Cons:

  • ⚠️ Complex and risky
  • ⚠️ Requires force push
  • ⚠️ Can break collaborators' repos

How to do it (use with caution):

# Use git-filter-repo (safer than filter-branch)
pip install git-filter-repo

# Remove files from all history
git filter-repo --path AUDIT-SUMMARY.md --invert-paths
git filter-repo --path POSITIONING-UPDATES-SUMMARY.md --invert-paths
# ... repeat for each file

# Force push (WARNING: destructive)
git push --force

Option 4: Use robots.txt (Partial Solution)

Pros:

  • ✅ Asks search engines not to index
  • ✅ Easy to implement

Cons:

  • ⚠️ Not enforced - search engines can ignore it
  • ⚠️ Doesn't work for GitHub repos (GitHub controls robots.txt)
  • ⚠️ Already-indexed content remains

Not applicable for GitHub repos (GitHub manages their own robots.txt)

Option 5: Move to Separate Private Repo

Pros:

  • ✅ Complete separation
  • ✅ Version control maintained
  • ✅ Public repo stays clean

Cons:

  • ⚠️ More complex workflow
  • ⚠️ Need to manage two repos

How to do it:

# Create new private repo: memory-dev-docs-internal
# Move internal docs there
mkdir ../memory-dev-docs-internal
mv AUDIT-*.md *-SUMMARY.md ../memory-dev-docs-internal/
cd ../memory-dev-docs-internal
git init
git add .
git commit -m "Internal docs"
git remote add origin https://github.com/Papr-ai/memory-dev-docs-internal
git push -u origin main

Step 1: Check Current Status

# Check if repo is public
curl -I https://github.com/Papr-ai/memory-dev-docs 2>&1 | grep "HTTP" | grep "200" && echo "⚠️ PUBLIC" || echo "✅ PRIVATE"

Step 2: Decide Based on Repo Status

If repo is PRIVATE:

  • ✅ No action needed
  • Internal docs are already protected
  • Search engines cannot access

If repo is PUBLIC:

Option A - Keep Public, Remove Internal Docs:

# Add to .gitignore and remove from git
./organize-internal-docs.sh  # Moves to internal/
echo "internal/" >> .gitignore
git rm -r --cached internal/
git commit -m "Remove internal docs from version control"
git push

Option B - Make Repo Private:

  1. Go to https://github.com/Papr-ai/memory-dev-docs/settings
  2. Scroll to "Danger Zone"
  3. Click "Change visibility" → "Make private"
  4. Confirm

Option C - Separate Repos:

  1. Create new private repo for internal docs
  2. Move internal docs there
  3. Keep public repo for customer-facing docs only

Step 3: Request Google to Remove Cached Content (If Already Indexed)

If internal docs are already in Google search results:

  1. Visit: https://search.google.com/search-console/removals
  2. Request removal of URLs:
    • https://github.com/Papr-ai/memory-dev-docs/blob/main/AUDIT-SUMMARY.md
    • (Repeat for each internal doc)
  3. Wait 24-48 hours for removal

Current Risk Assessment

Redocly Docs (platform.papr.ai)

Exposure: ✅ NONE
Searchable: ❌ NO
Action needed: ✅ None - already protected

GitHub Repo

Exposure: ⚠️ DEPENDS on repo visibility
Searchable: ⚠️ YES if public, NO if private
Action needed: ⚠️ Check repo visibility, decide on solution

Content Sensitivity

Current internal docs: ✅ Low sensitivity
Risk if exposed: ✅ Low - mostly planning and strategy
Urgency: 🟡 Medium - should address but not critical

Quick Decision Matrix

ScenarioRecommended ActionUrgency
Repo is private✅ No action neededNone
Repo is public + docs are non-sensitive🟡 Consider adding to .gitignoreLow
Repo is public + docs contain strategy🟠 Add to .gitignore or make repo privateMedium
Repo is public + docs contain sensitive info🔴 Make repo private immediatelyHigh

Bottom Line

Can someone find internal docs via Google?

  • NO if repo is private
  • NO via platform.papr.ai (not deployed)
  • YES if repo is public (searchable on GitHub)

What to do?

  1. Check if repo is public: Visit https://github.com/Papr-ai/memory-dev-docs in incognito
  2. If public: Decide if you want internal docs searchable
  3. If not: Add to .gitignore or make repo private

Current risk: 🟡 Low-Medium (depends on repo visibility)