Speaker Identification

When you review a meeting transcription in Copera, each spoken segment is labeled with the speaker who said it. This is powered by a combination of real-time active-speaker tracking during the meeting and AI-based diarization during post-processing.

How it works

Speaker identification happens in two stages:

1. Real-time tracking during the meeting

While the meeting is in progress and transcription is active, the frontend tracks which participant is speaking at any given moment using LiveKit's active-speaker detection. This data -- a timeline of who was talking when -- is sent to the backend and stored alongside the audio recording.

2. Post-processing with AI diarization

After the transcription recording stops, the audio is sent to the speech-to-text service, which performs its own speaker diarization -- splitting the audio into segments and labeling each segment with a generic speaker label (Speaker A, Speaker B, etc.).

The system then runs a matching algorithm that compares the real-time active-speaker data from step 1 with the diarization output from step 2. By finding time overlaps, it automatically maps the generic labels to actual participant names.

Viewing speakers in the transcript

When you open a transcript in the session viewer, the Speakers panel on the left shows all identified speakers. Each speaker is represented by their avatar (if matched to a workspace member) and is assigned a unique color. Utterances in the transcript are color-coded to match, making it easy to scan through a long transcript and see who said what.

You can click on a speaker in the Speakers panel to filter the transcript to only show utterances from that person. Click again to clear the filter and show all utterances.

Correcting speaker assignments

Sometimes the automatic matching may not be perfectly accurate -- especially in meetings with many participants who speak in quick succession. You can manually reassign any speaker label:

In the Speakers panel, click the avatar next to a speaker label.
A dropdown appears showing all meeting participants.
Select the correct participant, and all utterances from that speaker are updated immediately.
You can also add a custom speaker (by name and email) for external participants who were not workspace members.

When you open a speaker selection dropdown, the transcript automatically filters to that speaker and plays their first utterance so you can hear their voice and confirm the correct assignment.

Tips for better accuracy

Avoid crosstalk -- when multiple people talk at the same time, the system has a harder time distinguishing speakers.
Use individual microphones -- if participants share a single microphone (e.g., in a conference room), the system may group them as one speaker.
Longer recordings help -- the more audio the system has, the more accurately it can identify distinct voice patterns.

How it works​

1. Real-time tracking during the meeting​

2. Post-processing with AI diarization​

Viewing speakers in the transcript​

Correcting speaker assignments​

Tips for better accuracy​