work record · Jan 2026 – Present

Boomerang Inc.

Software Engineering Intern · New York, NY

stack

KotlinPyTorchFastAPIpgvectorRedisClaude

role

Software Engineering Intern

01context

What Boomerang is

Boomerang is alumni search for recruiters. Think LinkedIn Recruiter, except the graph is your company's alumni and their second-degree connections. My job over the internship was pretty open-ended: make recruiters faster at the whole loop. Finding people, reaching out, scoring matches, and helping our own team ship fast enough to keep up. I ended up working on all four.

02search

NLP search over 2M alumni

The old search was a form with 300+ filters. To find people who just left FAANG you had to know to set "previous company = Meta" and "tenure 2+ years" and "departure year = 2023." Most recruiters never figured that out.

So I embedded all 2M alumni profiles into one dense vector each (title and company history, location, education, bio) and stored them in Postgres with pgvector^[1] on HNSW indexes. A small rewriting step turns a query like "founders who left FAANG in 2023" into hard filters (role: founder, prior company in FAANG, year: 2023) plus a leftover semantic vector. The filters run in SQL, the vector does the similarity search, and a reranker stitches the list together.

fig 01 — chartnatural language → audience

search · 2M alumni

query-rewrite

hard filters · sql

prior_company ∈{ Stripe, Square }seniority ≥StafflocationNew York, NYdeparture_year2023

residual vector · dim 1536

~ "fintech founder, building now"

pgvector · HNSW→ANN top-200→rerank

audience2,041,883 → 312 matches

fig 01Plain-English query → filters + vector → a real audience

The quality came from training on misses. When a recruiter ran a search, skipped the top hits, and clicked someone on page 3, that click is a positive and everything above it is a negative. Two months of that loop closed most of the gap with the old hand-tuned filters, and outreach kept climbing.

fig 02 — chartoutreach/seat/month

fig 02Recruiter outreach per seat after the NLP search rollout

03ranker

Two-stage neural retriever

The "candidates for this role" feed ran on a gradient-boosted ranker over hand-built features like title match, tenure, and school tier. It worked fine, but acceptance, meaning the recruiter actually reached out, was stuck around 8%.

I replaced it with a two-stage neural retriever trained on 12M past candidate-job pairs (accepted = positive, dismissed in under 5 seconds = hard negative). Stage one is a bi-encoder that maps candidates and jobs into the same space, fast enough to score the whole pool. Stage two is a cross-encoder that re-ranks the top 200 by reading the job and the profile together. Trained in PyTorch, served behind FastAPI with batched inference.^[2]

fig 03 — chart12M pairs · top-200 cut

fig 03Bi-encoder scores the pool, cross-encoder re-ranks the top 200

python// synced to diagram

def score_candidates(job, pool):
    job_vec = bi_encoder.encode_job(job)            # (d,)
    cand_vecs = bi_encoder.encode_candidates(pool)  # (N, d)
    coarse = cand_vecs @ job_vec                    # (N,)
    top200 = pool[np.argpartition(-coarse, 200)[:200]]
    fine = cross_encoder.score_pairs(job, top200)   # (200,)
    return top200[np.argsort(-fine)]

Two-stage scoring, highlighted as the diagram runs

Acceptance, meaning the recruiter actually reached out, went from about 8% on the old ranker to 31% with bi-encoder plus cross-encoder. That worked out to roughly 4x more good hires per search. The cross-encoder is the expensive part, so the 200-cap earns its keep. Bumping it to 500 added less than a point of acceptance for 2.5x the latency, so I left it.

04dev velocity · ops

The software factory

Before I shipped any product, I rebuilt how we shipped product. Every dev step (triage, branch setup, scaffolding, review, PR descriptions, tests) got a Claude^[3] entrypoint in our internal CLI. Tickets carry real context: linked Notion docs, Granola call transcripts, past PRs on the same files. The factory hands that context to Claude with the right prompt for each step. Closed tickets per cycle went from about 14 to 31, and review acceptance didn't drop. Most of the win wasn't faster code generation. It was context plumbing. Claude is only as good as the smallest useful slice of repo and docs you can hand it.

The clearest place to watch the factory run is the in-app bug widget. A recruiter hits a bug, clicks the little widget in the corner, and types one sentence. Context attaches itself (URL, last 5 API calls, user role, feature flags) and the pipeline takes over: a triage classifier sorts it, an agent drafts and writes the fix, tests and QA run, and it lands in the on-call channel for an engineer to confirm before it merges.

fig 04 — chartreport → triage → fix → qa → merge

fig 04One bug report, triaged and fixed mostly without an engineer

70% of bugs now close without an engineer touching anything, and on-call pages are down about 80%. The failure I worry about is a confident wrong patch on a bug that looks familiar but isn't. It happened twice in review, so I added a novelty score against the embedding index of past bugs to force human triage on anything new.

05infra

Cost and perf wins

HR sync, 52 min to 3 min. Our biggest customers push 200K-employee snapshots every night, and the old job re-fetched and re-enriched every single record. I added a Redis dirty-set keyed on (employee_id, source_etag) so we only touch rows that changed. The full sync dropped from 52 minutes to about 3, a 17x speedup, and we stopped tripping the source API's rate limits on Mondays.

fig 05 — chart200K rows · nightly

fig 05Dirty-set sync processes only the rows that changed

OpenAI bill down 60%. Two things. Field normalization ("Sr. SWE II" to "Senior Software Engineer," "MSFT" to "Microsoft") went from one LLM call per row to a few-shot prompt doing 50 rows at a time. And the nightly enrichment now sends only deltas against the last run, with cache hits served from Redis. Same accuracy on our eval set, way smaller invoice.

fig 06 — chartmonthly openai cost · baseline=100

fig 06Where the 60% OpenAI savings come from

06open problems

What I'd do differently

The bug widget shipped before I had a good way to measure bad auto-PRs in prod. I was tracking merged vs rejected, not merged-then-reverted-within-30-days, which is the number that actually matters. We added it, just later than I wanted. The search reranker is also still one model per locale, and I think per-customer adapters would beat it, but I ran out of time to prove it. And I leaned on Claude for code review more than I should have. It approves subtly wrong refactors in test files more often than you'd expect, so humans still need to read test diffs closely.

/ footnotes

[1]pgvector, open-source vector similarity for Postgres — github.com/pgvector/pgvector. ↩
[2]Karpukhin et al., Dense Passage Retrieval for Open-Domain Question Answering — origin of the bi-encoder + cross-encoder pattern. arxiv.org/abs/2004.04906. ↩
[3]Anthropic, Claude API documentation — docs.anthropic.com. ↩