What this does
Google Cloud Logging captures every request that hits your HTTPS Load Balancer or Cloud CDN. You’ll route those logs to a Pub/Sub topic, push them through a small Cloud Run relay inside your project, and forward them to Searchable. We classify AI bots at our edge and drop everything else.Everything runs inside your GCP project. The relay is stateless, idempotent, and only forwards payloads — it doesn’t read or store them.
Prerequisites
A GCP project running an HTTPS Load Balancer or Cloud CDN
roles/pubsub.admin, roles/logging.admin, and roles/run.admin on that project (or equivalent)gcloud CLI installed and authenticated, or access to Cloud ShellA Searchable project with your domain confirmed
Setup
Generate an integration token in Searchable
- Open your Searchable dashboard
- Go to LLM Analytics → Setup
- Pick Google Cloud Platform as your crawler source
- Click Generate token
sa_… and won’t be shown again. You can always generate a new one if you lose it.Create the Pub/Sub topic
In Cloud Shell (or any terminal with This is a dedicated topic that will hold log entries en route to Searchable. Keeping it separate from any other logging pipeline you have makes troubleshooting and removal trivial.
gcloud authenticated to the target project):Deploy the Cloud Run relay
Create an empty directory and add two files. The relay is a single HTTP handler that adds the Create a dedicated service account for Pub/Sub to invoke the relay as, deploy with Cloud Run will reject any invocation that isn’t signed by
X-Searchable-Token header and forwards the Pub/Sub push body to Searchable unchanged.package.json:index.js:--no-allow-unauthenticated, then grant it roles/run.invoker. Replace <TOKEN> with the token from step 1:searchable-pubsub-invoker, so the relay can’t be called by anyone who happens to find the URL.Copy the Service URL printed by gcloud run deploy — you’ll use it in the next step.Create the push subscription
Replace The Searchable integration token is injected by the relay; the OIDC service-account auth here is what gates access to the relay itself.
<your Cloud Run URL> with the Service URL from the previous step. The subscription mints an OIDC token from searchable-pubsub-invoker for every push:Create the Log Sink
Route HTTPS Load Balancer and Cloud CDN logs into the topic:GCP prints a service-account email after this command — grant it
roles/pubsub.publisher on your project. The exact gcloud command is shown in the output; alternatively in the Console you’ll see a yellow banner asking you to authorize the sink.Verifying the connection
In Searchable:- Go to LLM Analytics → Setup
- Look at the Google Cloud Platform card status
- Click Check if it still shows “Waiting for first event”
| Status | What it means |
|---|---|
| Waiting for first event | The subscription is configured but no AI bot has hit your site yet. Typical wait is a few hours for sites that are already indexed. |
| Connected | Events are arriving. The card shows the count from the last 24 hours. |
Geo enrichment caveat
GCP HTTPS Load Balancer logs do not include geographic enrichment by default —country, region, and city will arrive empty in Searchable. The integration is otherwise fully functional. If you need geo, the simplest workaround is to also instrument your site with the Searchable Beacon (s.js), which derives geo from the visitor’s request.
Troubleshooting
Searchable shows 401 errors or the card stays 'Not connected'
Searchable shows 401 errors or the card stays 'Not connected'
The relay isn’t sending the
X-Searchable-Token header — either it’s misconfigured or the token has been revoked.- Confirm the Cloud Run service has
SEARCHABLE_TOKENset. List env var names only (not values) withgcloud run services describe searchable-relay --region=us-central1 --format='value(spec.template.spec.containers[0].env[].name)'. If you used Secret Manager, the entry shows asSEARCHABLE_TOKENand the value lives in the secret - If you’ve recently revoked the token in Searchable, generate a new one and redeploy the relay with the new value
Pub/Sub reports delivery errors or messages pile up unacked
Pub/Sub reports delivery errors or messages pile up unacked
Pub/Sub can’t reach the relay, or the log filter is sending non-HTTP logs.
- Verify the subscription’s push endpoint in the Console (Pub/Sub → Subscriptions → searchable-ai-traffic-sub → Edit) matches the deployed Cloud Run URL
- Check Cloud Run logs (
gcloud run services logs read searchable-relay --region=us-central1 --limit=50) for non-200 responses — those become Pub/Sub retries - Verify the sink filter is exactly
resource.type="http_load_balancer" OR resource.type="cloud_cdn". Broader filters can send non-HTTP logs that we reject with 204 (no retry) but also waste your Pub/Sub quota
The sink isn't routing logs to the topic
The sink isn't routing logs to the topic
GCP requires the sink’s service account to have
roles/pubsub.publisher on the topic. Without it the sink silently drops messages.- Run
gcloud logging sinks describe searchable-ai-traffic-sinkto find the writer service account - Grant it publisher on the topic:
gcloud pubsub topics add-iam-policy-binding searchable-ai-traffic --member=serviceAccount:<writer-sa> --role=roles/pubsub.publisher
`gcloud run deploy` fails with an org-policy or ingress error
`gcloud run deploy` fails with an org-policy or ingress error
Some GCP orgs enforce policies that further restrict Cloud Run. The default setup already deploys with
--no-allow-unauthenticated, which satisfies most org policies — these errors usually point at one of the others:constraints/run.allowedIngressforces internal-only: redeploy with--ingress=internal-and-cloud-load-balancingand place a Pub/Sub-VPC connector in front, or run the relay on Cloud Functions insteadconstraints/iam.disableServiceAccountCreationblocks thegcloud iam service-accounts createcommand: use an existing service account your org already trusts and reuse its email in bothadd-iam-policy-bindingand--push-auth-service-accountconstraints/iam.allowedPolicyMemberDomainsblocks the invoker binding: have an admin add the project’s service-agent domain to the allow-list, then re-run the binding command
Status stays on 'Waiting for first event' for more than 24 hours
Status stays on 'Waiting for first event' for more than 24 hours
A few possible causes:
- The Log Sink filter doesn’t match your service — confirm your HTTPS Load Balancer logs actually populate Cloud Logging (View → Logs Explorer → filter on
resource.type="http_load_balancer"and confirm you see entries) - Your domain in Searchable doesn’t match the site served by GCP (check LLM Analytics → Setup → Confirm your domain)
- No AI bot has visited yet — try visiting your site with a known AI user agent (e.g.
Mozilla/5.0 (compatible; GPTBot/1.0)) to trigger a test event
Removing the integration
To stop sending traffic to Searchable and tear down the GCP-side resources:401, which Pub/Sub will retry until the messages expire).
Next steps
See the data
Open LLM Analytics to see which assistants are crawling your site.
Add Search Console
Correlate AI crawls with search demand.