The Problem with Content Channels
Telegram channels are one of the best distribution mechanisms on the internet. No algorithm deciding who sees your content. No pay-to-reach-your-own-audience. Every subscriber gets every message. But running a content channel manually — finding articles, writing summaries, formatting posts, publishing on schedule — burns hours every day. You become the bottleneck.
The alternative most people reach for is worse: dump raw RSS titles into a channel and call it automation. Nobody subscribes to that. It reads like a broken news ticker. No context, no analysis, no reason to stay.
There is a middle path. Automate the aggregation and scheduling. Let AI handle the enrichment. Keep the editorial quality high without the manual labor. This is the architecture behind WordPress Pulse, Automation News, and WP Jobs — three channels that run continuously with zero daily intervention.
Architecture Overview
The pipeline has four stages. Each one is independent, runs on its own schedule, and fails without bringing down the others.
Stage 1: Aggregation. Poll RSS feeds on configurable intervals. Parse entries, extract metadata, store raw articles in SQLite with a pending status.
Stage 2: Deduplication. Before any article enters the enrichment queue, check it against every article already in the database. URL match is the first pass. Title similarity is the second. Duplicates get marked skipped and never processed.
Stage 3: Enrichment. Send pending articles to Claude for summary and analysis. Receive structured output. Store the enriched version. Mark the article ready.
Stage 4: Publishing. A cron job picks ready articles in batches, formats them as Telegram HTML, and posts them to the channel with randomized intervals between messages.
The entire system is a single TypeScript process running in Docker, triggered by cron. No long-running event loops. No WebSocket connections. The bot wakes up, processes a batch, and exits. This matters on a shared server — a bot that idles at 200MB of RAM for 23 hours a day is a bot that needs its own infrastructure.
Source Management
Sources are RSS feed URLs stored in a configuration table. Each source has a name, URL, category, polling interval, and an enabled flag. The poller iterates through enabled sources, fetches their feeds, and inserts new entries.
import Parser from "rss-parser";
interface FeedSource {
id: number;
name: string;
url: string;
category: string;
pollIntervalMinutes: number;
enabled: boolean;
lastPolledAt: string | null;
}
const parser = new Parser({
timeout: 10_000,
headers: {
"User-Agent": "GlacierPhonk-Bot/1.0",
},
});
async function pollSource(
db: Database,
source: FeedSource,
): Promise<number> {
const feed = await parser.parseURL(source.url);
let inserted = 0;
for (const item of feed.items) {
if (!item.link || !item.title) continue;
const exists = db
.prepare("SELECT 1 FROM articles WHERE url = ?")
.get(item.link);
if (exists) continue;
db.prepare(`
INSERT INTO articles (url, title, content, source_id, status, created_at)
VALUES (?, ?, ?, ?, 'pending', datetime('now'))
`).run(item.link, item.title, item.contentSnippet ?? "", source.id);
inserted++;
}
db.prepare("UPDATE sources SET last_polled_at = datetime('now') WHERE id = ?")
.run(source.id);
return inserted;
}
The rss-parser library handles Atom, RSS 2.0, and most nonstandard feed formats. Set a timeout — some feeds are hosted on servers that hang for 30 seconds before responding. The User-Agent header matters: some WordPress sites block requests without one.
Polling intervals vary by source. A high-volume feed like Hacker News might poll every 15 minutes. A company blog that publishes twice a week can poll every 6 hours. Over-polling wastes bandwidth and risks IP-based rate limiting. Under-polling misses time-sensitive content. Match the interval to the source’s actual publishing frequency.
Content Deduplication
Duplicate content kills a channel. Subscribers see the same article posted twice and assume the bot is broken. Deduplication happens at two levels.
URL-Based Deduplication
The simplest check: normalize the URL and query the database. This catches exact matches — the same article appearing in multiple RSS feeds, or the same feed entry appearing across polling cycles.
function normalizeUrl(url: string): string {
const parsed = new URL(url);
// Strip tracking parameters
const trackingParams = ["utm_source", "utm_medium", "utm_campaign", "ref", "source"];
trackingParams.forEach((p) => parsed.searchParams.delete(p));
// Remove trailing slash
parsed.pathname = parsed.pathname.replace(/\/+$/, "");
return parsed.toString();
}
UTM parameters are the main offender. The same article URL with ?utm_source=rss and ?utm_source=newsletter are the same article. Strip them before comparison.
Title Similarity
URL deduplication misses cases where the same story is published at different URLs — syndicated content, press releases picked up by multiple outlets, or articles that get republished under a new slug. Title similarity catches these.
function similarity(a: string, b: string): number {
const normalize = (s: string) =>
s.toLowerCase().replace(/[^a-z0-9\s]/g, "").trim();
const wordsA = new Set(normalize(a).split(/\s+/));
const wordsB = new Set(normalize(b).split(/\s+/));
const intersection = new Set([...wordsA].filter((w) => wordsB.has(w)));
const union = new Set([...wordsA, ...wordsB]);
return intersection.size / union.size; // Jaccard index
}
function isDuplicate(
db: Database,
title: string,
threshold = 0.7,
): boolean {
const recent = db
.prepare(
"SELECT title FROM articles WHERE created_at > datetime('now', '-7 days')"
)
.all() as { title: string }[];
return recent.some((row) => similarity(row.title, title) >= threshold);
}
The Jaccard index works well for headlines. A threshold of 0.7 catches “WordPress 6.8 Released with New Block Editor” and “WordPress 6.8 Released — New Block Editor Features” as duplicates, while allowing legitimately different articles through. Only compare against the last 7 days of articles — scanning the entire history is wasteful and produces false positives on generic titles.
AI Enrichment Pipeline
This is where the content channel separates itself from an RSS aggregator. Every article gets processed through Claude before it can be published. The AI generates a summary, extracts key points, and provides analysis that the raw article metadata cannot.
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
interface EnrichedArticle {
summary: string;
keyPoints: string[];
category: string;
relevanceScore: number;
}
async function enrichArticle(
title: string,
content: string,
channelContext: string,
): Promise<EnrichedArticle> {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: `You are a content analyst for a Telegram channel about ${channelContext}.
Analyze this article and return a JSON object with:
- summary: 2-3 sentence summary for a Telegram post (max 280 chars)
- keyPoints: array of 2-4 key takeaways (each max 100 chars)
- category: one of [release, tutorial, opinion, news, tool]
- relevanceScore: 1-10 how relevant this is to the channel audience
Article title: ${title}
Article content: ${content}
Return only valid JSON, no markdown fences.`,
},
],
});
const text =
response.content[0].type === "text" ? response.content[0].text : "";
return JSON.parse(text) as EnrichedArticle;
}
The prompt is specific to the channel’s niche. WordPress Pulse gets a WordPress ecosystem context. Automation News gets an automation platforms context. The same enrichment function serves every channel — only the context string changes.
Structured output matters. The response must parse as JSON or the enrichment fails. Claude is reliable here, but you still wrap the JSON.parse in a try-catch. If it throws, the article stays pending and gets retried on the next cycle.
The No-Fallback Principle
This is the most important architectural decision in the entire system: if AI enrichment fails, the article stays pending. Nothing posts without AI processing.
No fallback to raw RSS data. No “post the title and link if Claude is down.” No degraded mode. The channel either publishes enriched content or it publishes nothing.
async function processEnrichmentBatch(db: Database): Promise<void> {
const pending = db
.prepare(
"SELECT * FROM articles WHERE status = 'pending' ORDER BY created_at ASC LIMIT 5"
)
.all() as Article[];
for (const article of pending) {
try {
const enriched = await enrichArticle(
article.title,
article.content,
"WordPress ecosystem news and development",
);
db.prepare(`
UPDATE articles
SET summary = ?, key_points = ?, category = ?,
relevance_score = ?, status = 'ready',
enriched_at = datetime('now')
WHERE id = ?
`).run(
enriched.summary,
JSON.stringify(enriched.keyPoints),
enriched.category,
enriched.relevanceScore,
article.id,
);
} catch (err) {
console.error(`Enrichment failed for article ${article.id}:`, err);
// Article stays 'pending' — will be retried next cycle
// Do NOT fall back to posting raw content
}
}
}
Why this matters: your channel’s value proposition is curated, analyzed content. The moment you post a raw RSS title with a bare link, you break that promise. Subscribers joined for the enrichment. A few hours of silence during an API outage is better than a stream of low-effort posts that train subscribers to ignore you.
The batch limit of 5 is deliberate. On a shared server, processing 50 articles through Claude simultaneously means 50 concurrent API calls, spikes in memory, and potential timeouts. Five at a time, triggered by cron every 10 minutes, keeps the load predictable.
Scheduling and Publishing
Cron drives the entire system. No persistent process. No event loop waiting for timers. The bot starts, does its work, and exits.
# Poll RSS feeds every 30 minutes
*/30 * * * * cd /opt/glacierphonk/daemon-bot && docker compose run --rm bot npm run poll
# Enrich pending articles every 10 minutes
*/10 * * * * cd /opt/glacierphonk/daemon-bot && docker compose run --rm bot npm run enrich
# Publish ready articles every 15 minutes
*/15 * * * * cd /opt/glacierphonk/daemon-bot && docker compose run --rm bot npm run publish
Each cron entry runs a different script. They never overlap because each one processes a small batch and exits quickly. The polling cycle takes 5–10 seconds for a dozen feeds. Enrichment takes 10–30 seconds for 5 articles. Publishing takes under 5 seconds for a batch.
Randomized Intervals
Posting at exact intervals looks robotic. A message at :00, :15, :30, :45 every hour screams automation. Randomize the delay between posts within a batch to make the publishing cadence feel organic.
function randomDelay(minMs: number, maxMs: number): Promise<void> {
const delay = Math.floor(Math.random() * (maxMs - minMs + 1)) + minMs;
return new Promise((resolve) => setTimeout(resolve, delay));
}
async function publishBatch(
bot: Bot,
db: Database,
channelId: string,
): Promise<void> {
const ready = db
.prepare(
"SELECT * FROM articles WHERE status = 'ready' ORDER BY relevance_score DESC LIMIT 5"
)
.all() as Article[];
for (const article of ready) {
const message = formatTelegramMessage(article);
try {
await bot.api.sendMessage(channelId, message, {
parse_mode: "HTML",
disable_web_page_preview: false,
});
db.prepare("UPDATE articles SET status = 'published', published_at = datetime('now') WHERE id = ?")
.run(article.id);
// Wait 30-90 seconds between posts
if (ready.indexOf(article) < ready.length - 1) {
await randomDelay(30_000, 90_000);
}
} catch (err) {
handlePublishError(err, article, db);
}
}
}
Higher-relevance articles publish first within each batch. The scoring from the AI enrichment step drives priority — a major WordPress release (relevance 9) publishes before a minor plugin update (relevance 4).
Message Formatting
Telegram supports a subset of HTML in messages: <b>, <i>, <a>, <code>, <pre>, and a few others. No <div>, no <p>, no CSS. Line breaks are literal \n characters. The character limit for a single message is 4096.
function formatTelegramMessage(article: Article): string {
const keyPoints = JSON.parse(article.key_points) as string[];
const pointsList = keyPoints.map((p) => `• ${p}`).join("\n");
const message = [
`<b>${escapeHtml(article.title)}</b>`,
"",
escapeHtml(article.summary),
"",
pointsList,
"",
`<a href="${article.url}">Read full article</a>`,
].join("\n");
// Telegram message limit
if (message.length > 4096) {
return message.slice(0, 4090) + "...";
}
return message;
}
function escapeHtml(text: string): string {
return text
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">");
}
HTML escaping is mandatory. Article titles frequently contain &, <, and > characters. An unescaped < in a title breaks Telegram’s HTML parser and the entire message fails to send. The error from Telegram is unhelpful: “Bad Request: can’t parse entities.” Escape everything that is not an intentional tag.
Link previews are controlled by the disable_web_page_preview parameter. Keep them enabled — the preview image and description from the source article add visual weight to the post. A text-only channel post gets scrolled past. A post with a preview image gets read.
Rate Limiting and Flood Control
Telegram enforces strict rate limits on bot API calls. The global limit is approximately 30 messages per second across all chats. For a single channel, the practical limit is about 20 messages per minute. Exceed it and the API returns a 429 status with a retry_after field indicating how many seconds to wait.
import { GrammyError } from "grammy";
async function handlePublishError(
err: unknown,
article: Article,
db: Database,
): Promise<void> {
if (err instanceof GrammyError) {
if (err.error_code === 429) {
// Rate limited — extract retry_after from the error
const retryAfter = (err.parameters?.retry_after ?? 30) * 1000;
console.warn(`Rate limited. Waiting ${retryAfter}ms`);
await new Promise((resolve) => setTimeout(resolve, retryAfter));
// Article stays 'ready' — will be retried next cycle
return;
}
if (err.error_code === 400) {
// Bad request — likely a formatting issue
console.error(`Formatting error for article ${article.id}:`, err.description);
db.prepare("UPDATE articles SET status = 'error', error = ? WHERE id = ?")
.run(err.description, article.id);
return;
}
}
console.error(`Unexpected publish error for article ${article.id}:`, err);
}
The 429 handler respects the server’s retry_after value exactly. Do not guess. Do not use a fixed backoff. Telegram tells you precisely how long to wait. Ignoring it escalates to longer cooldowns and eventually temporary bans.
grammY’s auto-retry transformer plugin handles this automatically for most cases. Install it once and it intercepts 429 responses, waits the specified duration, and retries the request.
import { autoRetry } from "@grammyjs/auto-retry";
bot.api.config.use(autoRetry({
maxRetryAttempts: 3,
maxDelaySeconds: 60,
}));
Real Channels in Production
This architecture powers three live channels at GlacierPhonk™, each targeting a different niche.
WP Jobs aggregates WordPress job listings from multiple job boards. The AI enrichment extracts salary ranges, required skills, and remote/onsite status into a standardized format. Subscribers get clean, scannable job posts instead of raw job board listings with inconsistent formatting.
WordPress Pulse tracks the WordPress ecosystem — core releases, plugin updates, security advisories, community news. The enrichment pipeline adds context: is this a breaking change? Which sites are affected? What action should site owners take? The channel turns scattered RSS entries into actionable intelligence.
Automation News covers automation platforms — Make, Zapier, n8n, IFTTT, Pabbly, and others. Same architecture, different sources and AI context. The enrichment focuses on pricing changes, new integrations, and platform comparisons.
All three run on the same EC2 instance, share the same codebase pattern, and cost nothing beyond the server and API calls. The channels grow through Telegram’s native discovery and cross-promotion. No paid ads. No social media marketing. The content does the work.
Scaling to Multiple Channels
The architecture isolates each channel behind configuration, not code. Adding a new channel means adding a new config block — not forking a repository.
interface ChannelConfig {
id: string;
name: string;
telegramChannelId: string;
aiContext: string;
sources: FeedSource[];
publishIntervalMinutes: number;
maxBatchSize: number;
relevanceThreshold: number;
}
const channels: ChannelConfig[] = [
{
id: "wp-pulse",
name: "WordPress Pulse",
telegramChannelId: "@WordPressPulse",
aiContext: "WordPress ecosystem news, core development, plugins, themes, and security",
sources: [/* ... */],
publishIntervalMinutes: 15,
maxBatchSize: 5,
relevanceThreshold: 4,
},
{
id: "automation-news",
name: "Automation News",
telegramChannelId: "@AutomationNewsCh",
aiContext: "Automation platforms including Make, Zapier, n8n, IFTTT, Pabbly Connect",
sources: [/* ... */],
publishIntervalMinutes: 20,
maxBatchSize: 3,
relevanceThreshold: 5,
},
];
Each channel has its own relevance threshold. A WordPress security advisory at relevance 3 might not meet Automation News’s threshold of 5, but it clears WordPress Pulse’s threshold of 4. The AI scores relative to the channel context, and the threshold filters out noise without manual review.
The maxBatchSize parameter is per-channel, per-cycle. Three channels each publishing 5 articles per cycle means 15 API calls to Claude and 15 messages to Telegram. On a t3.medium instance, this completes in under two minutes. Scale further by staggering cron schedules across channels so they do not all fire simultaneously.
The Database Schema
SQLite handles everything. No PostgreSQL. No Redis. A single .sqlite file, persisted in a Docker volume, backed up daily to S3. For a content bot processing hundreds of articles per day, SQLite is not a bottleneck — it is an advantage. Zero configuration, zero maintenance, instant backups via file copy.
CREATE TABLE sources (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
url TEXT NOT NULL UNIQUE,
category TEXT NOT NULL,
channel_id TEXT NOT NULL,
poll_interval_minutes INTEGER DEFAULT 30,
enabled INTEGER DEFAULT 1,
last_polled_at TEXT
);
CREATE TABLE articles (
id INTEGER PRIMARY KEY AUTOINCREMENT,
url TEXT NOT NULL,
title TEXT NOT NULL,
content TEXT,
source_id INTEGER REFERENCES sources(id),
status TEXT DEFAULT 'pending', -- pending, ready, published, skipped, error
summary TEXT,
key_points TEXT, -- JSON array
category TEXT,
relevance_score INTEGER,
error TEXT,
created_at TEXT NOT NULL,
enriched_at TEXT,
published_at TEXT
);
CREATE UNIQUE INDEX idx_articles_url ON articles(url);
CREATE INDEX idx_articles_status ON articles(status);
CREATE INDEX idx_articles_created ON articles(created_at);
The status field drives the entire pipeline. Articles move through pending → ready → published, or branch to skipped (duplicate) or error (formatting failure, permanent API rejection). Each stage queries only the rows it needs. The indexes on status and created_at keep those queries fast even with tens of thousands of articles.
What This Costs
The running cost is minimal. A t3.medium EC2 instance (shared with other services) runs around $30/month. Claude API calls for enrichment depend on volume — processing 100 articles per day with short prompts costs roughly $2–5/month. RSS polling is free. Telegram’s Bot API is free. Domain and DNS are negligible.
Total cost to run three automated content channels: under $40/month. Compare that to hiring a content curator for even one channel. The economics are not close.
Building Your Own
The stack: TypeScript, grammY, rss-parser, the Anthropic SDK, better-sqlite3, Docker, cron. No framework. No ORM. No message queue. Each component is a function that reads from the database, does its work, and writes back. Test each stage independently. Deploy as a single container.
Start with one channel, three RSS sources, and a 30-minute polling interval. Get the pipeline working end to end before optimizing. Add sources incrementally. Tune the relevance threshold after you see what scores the AI assigns to your content. Resist the urge to over-engineer the configuration system before you have a single post in your channel.
If you want this built for your niche — or need help adapting the architecture to your use case — reach out through the GlacierPhonk™ inquiry bot.