Zero-Credential Twitter Search: Connecting Two Public Services

DuckDuckGo indexes public tweets but can't read them. fxtwitter reads tweets but can't search them. I connected them into a zero-credential Twitter search pipeline.

The Insight

DuckDuckGo indexes public tweets. A site:x.com query returns real tweet URLs, timestamped and ranked. But DuckDuckGo can't read the tweets — you just get URLs and snippets.

fxtwitter has a public JSON API. fxtwitter is an open-source project (3,500+ stars on GitHub) built to fix Twitter embeds in Discord. It exposes a clean API at api.fxtwitter.com that returns full structured tweet data — author, text, likes, retweets, reply count, timestamps. No auth, no API key, no account required.

I wanted to pull tweet data without paying for API access. The URLs DuckDuckGo returns contain the exact tweet IDs that fxtwitter's API expects. One regex and an HTTP call turn a search result into structured data.

How It Works

For each query, prepend site:x.com and hit DuckDuckGo with a 24-hour time filter:

full_query = f"site:x.com {query}"
results = DDGS().text(full_query, max_results=20, timelimit="d")

Extract the tweet ID and username from each URL via regex, then call the fxtwitter API:

api_url = f"https://api.fxtwitter.com/{username}/status/{tweet_id}"

Deduplicate by tweet ID, filter by minimum engagement, sort by likes + retweets. Top results get their thread content fetched as well. Output goes to stdout as JSON — no database, no web UI, no account system. Just a Unix-style pipe.

What I Chose Not to Build

Zero credentials by design. Not just "no API key required" — the tool is architecturally incapable of requiring one. Both upstream services are public. There's no auth flow to break, no token to expire, no rate limit tier to manage. This makes it meaningfully more robust than tools that depend on scraped session cookies or unofficial auth. The cost is that you're entirely at the mercy of those upstream services — there's no authenticated fallback if the public endpoints degrade.

Single file, minimal dependencies. The entire pipeline is one Python file. Dependencies are requests, ddgs, and pyyaml. No framework, no ORM, no async runtime. This isn't minimalism for its own sake — it's a deliberate choice for a tool that depends on external services that could change. When something upstream breaks, you want to be able to read the entire codebase in five minutes and patch it. The tradeoff is that there's no abstraction layer to swap in a different search provider or tweet fetcher — you'd be rewriting, not reconfiguring.

DuckDuckGo over other search engines. Google heavily filters site: results for social media. Bing works but requires an API key. DuckDuckGo returns clean results, has a maintained Python wrapper, and doesn't require authentication. Its index isn't as deep — you get recent popular tweets, not exhaustive historical coverage. For monitoring real-time discourse, that's actually what you want.

What I Ran Into

Thread content is self-replies only. Early on, my code labeled thread content as "replies," which was wrong. fxtwitter returns the author's own self-replies, not replies from other users. I caught this when results looked suspiciously one-sided and dug into what the API was actually returning. The fix was renaming the field and updating the docs — a good reminder that naming things after your assumptions instead of after the data is a fast way to mislead yourself.

Fragility is the core tradeoff. If fxtwitter returns a 500, the tweet is silently skipped. If DuckDuckGo stops indexing X, or fxtwitter shuts down its API, the tool breaks completely. There's no fallback. I built it knowing that. A tool with a shelf life is still more useful than no tool at all — and the architecture is simple enough that swapping in replacements is an afternoon of work, not a rewrite.

The Bigger Pattern

The pattern here isn't specific to Twitter. Reddit's API priced out most third-party tools, but Google still indexes Reddit posts and old.reddit.com still serves public HTML — same gap, same opportunity. Anywhere a platform locks down its API, structured data still leaks out through search engine indexes, embed services, RSS feeds, Open Graph tags. Find a service that indexes content and a service that enriches it, then connect them. The gap between "public but unstructured" and "structured but authenticated" is where tools like this live. They're fragile and they have a shelf life — but they're also the only option when the official door is closed, and simple enough to rebuild when the landscape shifts.

pip install trawlx
trawlx --query "claude code" --mode json

GitHub: github.com/timstarkk/trawlx