Transpire Technologies logo

work record · Jan 2024 – Apr 2024

Transpire Technologies

Software Engineering Intern · Toronto, ON

stack
PythonXGBoostNext.jsPlaywrightNotion API
role
Software Engineering Intern
01context

Tiny startup, ship-or-die

Transpire was a ~7-person startup doing event-marketing analytics for B2B. The product told you which industry events were worth sponsoring based on your ICP, and tracked attribution back to closed revenue. Two co-founders, a designer, a couple of GTM folks, one full-time engineer. I was the second engineer and the first intern.

My charter, written on a sticky note above the founder's monitor, was "grow the pipeline." That split into two problems they kept losing sleep over. Top of funnel (sales was scraping Eventbrite by hand and cold-emailing organizers on vibes) and middle of funnel (once a lead said yes, onboarding took two weeks and most of the dropoff happened after the contract was signed). I owned both. No PM, no design review, no Jira. Slack and a shared Notion. Ship-or-die.

02lead ranker

XGBoost over scraped events

First thing I built was a Playwright scraper that ran at 6 am and pulled the day's Toronto events from Eventbrite, Meetup, Lu.ma, and a few industry listings (DevTO, TechTO, FoundersBeta). Usually 20 to 30 events. Each one got normalized into a row: title, description, organizer, expected attendance, ticket price, venue, and a description embedding from sentence-transformers/all-MiniLM-L6-v2[2].

Then the ranker. We had ~18 months of historical conversion data, every event a sales rep had pursued, tagged with whether it ever produced a closed deal. That label set is the only reason any of this worked. I trained an XGBoost classifier[1] on a few features:

Event size (expected attendees, log-scaled). Organizer history, how many events they'd run and how many had landed in our pipeline. Topic similarity, cosine distance from the description embedding to the centroid of converted historical events. Day-of-week, lead time, ticket price bucket. The boring features mattered more than I expected.

Output was a 0 to 1 score that sales sorted on every morning. Above 0.7 got same-day outreach. Below 0.3 got ignored. The middle band was the interesting one. Sales triaged those by hand and their picks fed the next retrain.

fig 01 — chartscrape → score → bucket
A 6 am Playwright scraper pulls about 25 events per day. An XGBoost classifier scores each event between 0 and 1, and sales acts on the top bucket first — same-day outreach, triage, or ignored.06:00 · scraper run~25 events / dayplaywrightscraper4 sourcesclassifierxgboostscore 0 → 1same-day outreachscore ≥ 0.70triage queue0.3 – 0.70ignoredscore ≤ 0.30noise in · ranked out · sales sorts on score every morning
fig 01Scrape, normalize, embed, score, bucket. Sales sorts on the score every morning.
fig 02 — chartsignups / quarter
05101520256q2 '238q3 '239q4 '2322q1 '24quarternew signups
fig 02New client signups by quarter. Ranker launched mid-Jan 2024; Q1'24 has ~10 weeks of post-launch activity.

I won't claim full credit for the Q1 jump. Sales also hired a new BDR in February. But that BDR walked in every morning to a sorted list instead of a blank Eventbrite tab, and that compounded.

03onboarding

Killing the 14-day onboarding

Onboarding was broken in the way things stay broken at small startups: nobody had time to fix it.

When a client signed, the CEO ran a 90-minute "infra interview" to figure out their CRM, where their event data lived, which Slack channels mattered, what their attribution model looked like. Then he hand-built a Notion workspace for them, copying from a master template, renaming fields, wiring integrations. End to end, about 14 days from contract to a usable workspace, because the CEO was also doing everything else a CEO does.

I broke it into three pieces:

1. A Next.js self-serve site at onboard.transpire.io that the new client landed on right after signing. 2. A Typeform intake on that site with the 23 questions from the infra interview, branching so most clients only saw 8 to 12. 3. A Python script triggered by the Typeform webhook that did the infra assessment and used the Notion API[3] to clone the master template and auto-fill ~60 fields.

fig 03 — chart14d manual → 4d self-serve
Old onboarding ran 14 days, dominated by a manual Notion build by the CEO. The new self-serve flow finishes in 4 days — a Typeform pipes into a Python webhook that clones the Notion template, leaving only a short human review. pipeline 3×.contract → live workspaceshared scale · 0 → 14 daysonboarding144 dayspipeline 3×before · 14 dayskickoffmanual notion buildhandoffafter · 4 days · typeform → python → notion10 days savedformauto-fillliveday 0day 4day 7day 14long manual lane vs short automated lane · same scale
fig 0314-day manual build, 4-day self-serve flow.
python// excerpt
def provision_workspace(intake: dict) -> str:
    page = notion.pages.create(
        parent={"database_id": CLIENTS_DB},
        properties={
            "Name": {"title": [{"text": {"content": intake["company"]}}]},
            "CRM":  {"select": {"name": intake["crm"]}},
            "ICP":  {"rich_text": [{"text": {"content": intake["icp"]}}]},
        },
    )
    clone_template_blocks(source=TEMPLATE_ID, target=page["id"], vars=intake)
    return page["url"]

The Notion auto-fill was the satisfying part.

Contract-to-workspace dropped from 14 days to 4. Pipeline conversion roughly tripled, because most of the dropoff was people losing momentum during the wait.

04reflection

What I learned

Two things stuck. One, at a tiny startup you don't pick which problem to work on, you pick the one that's bleeding worst. Both my projects existed because someone was visibly suffering, not because they were the most interesting. Two, the ML model got the headline number but the boring Typeform-plus-template-cloner moved more revenue. Plumbing beats cleverness more often than I expected.

/ footnotes

  1. [1]Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. xgboost.readthedocs.io.
  2. [2]Reimers, N. & Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. sbert.net.
  3. [3]Notion API reference. developers.notion.com.