Medallia Speech enables fast transcriptions and powerful analytics of voice recordings to surface customer pain points. By analyzing every call, you can understand the financial impact, improve processes, and train agents better. Since it’s part of the Medallia Experience Cloud, voice insights are combined with other channels for a complete, rich review of your customer’s journey.

Use of Medallia Speech is a two-part interaction:

  1. Stage your recording files to Medallia Media File Transfer; and then
  2. Publish the recording metadata to the Medallia Speech API to begin processing.
565

In this guide, we’re going to review how to use both to successfully import data with Medallia Speech.

Audio Recordings

Great Speech analytics start with quality audio recordings. Medallia recommends the following guidelines for getting the most out of your call data.

PropertyGuideline
File FormatLossless compression formats such as FLAC are best, as they optimize for network bandwidth to transfer files to Medallia.

Lossy compression formats such as Ogg Vorbis or AAC or even MP3 can be used, but only if the compression has been tuned to not disrupt the audio recording more than ~2%.

Raw formats such as WAV are acceptable as a fallback.
BitrateHigher is better. 96 kb/s is the minimum ideal for voice, but higher such as 128 kb/s or even 256 kb/s is better.
Sample rateHigher is better. 8 kHz is the minimum for voice, but higher greatly assists with parsing nuances in accents.
ChannelsHaving separate audio channels for each speaker is best. This is often known as "stereo". (Some applications refer to this as "dual-channel"; while there are some technical differences between that and stereo, they do not matter to Medallia Speech.)

The audio for the agent should go in one audio channel; the audio for the customer should go in the other. It is best practice to make that audio channel mapping consistent, regardless of whether it is an inbound or outbound call. For example, agents are always "channel 1" and customers are always "channel 0".

Single channel ("mono") audio files are acceptable, but may require use of diarization to attempt to disambiguate the combined audio signals. This process often works well, but is not as reliable as splitting the audio into separate channels from the start.

Medallia Media File Transfer

Medallia Media File Transfer (MMFT) is a highly-scalable S3-based storage mechanism hosted inside Medallia’s data centers. MMFT is used to stage recordings inside Medallia’s data centers prior to processing.

While it is S3-based, there are a few notable differences from other S3 implementations:

  • The S3 endpoint URL must be specified. For Medallia, this is https://filestash.[dc].medallia.[tld], where [dc] is the data center abbreviation of the Medallia instance (ex: sc4, sea1, syd1, etc.) and [tld] is the top-level domain of the Medallia instance (ex: ca, com, com.au, eu, etc.); and
  • The S3 region is always us-east-1, regardless of the Medallia data center

For the purposes of this guide, we will use the AWS SDK for Java 2.x, an open-source library published by Amazon Web Services as an industry-standard reference. (Medallia has no affiliation with Amazon Web Services, and use of their AWS SDK should be made under its published guidelines and rules.)

📘

MMFT access credentials are shared when Speech is provisioned on your instance. Please contact Support or your Medallia representative if you need assistance.

Transferring the Recording File

We start by instantiating the S3 client with the MMFT-specific settings applied. Next, the recording file can be transferred using a standard S3 PutObject request. The below shows how to achieve this sequence:

# Set your S3 credentials provided by Medallia
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# Use this region for all data centers
export AWS_DEFAULT_REGION="us-east-1"

# Set the Medallia Media File Transfer endpoint
export MMFT_URL="https://filestash.sc4.medallia.com"

# Set the MMFT bucket name
export MMFT_BUCKET="..."

aws \
    --endpoint-url ${MMFT_URL} \
    s3 cp \
        recording.wav \
        s3://${MMFT_BUCKET}/recording.wav
final S3Client s3 = S3Client.builder()
    .region(Region.US_EAST_1)
    .credentialsProvider(StaticCredentialsProvider.create(
        AwsBasicCredentials.create(
            options.getAccessKey(),
            options.getSecretKey()
        )
    ))
    .endpointOverride(new URI(options.getEndpoint()))
    .serviceConfiguration(S3Configuration.builder()
        .pathStyleAccessEnabled(true)
        .build()
    )
    .build();

s3.putObject(
    PutObjectRequest.builder()
        .bucket(options.getBucket())
        .key(key)
        .build(),
    RequestBody.fromBytes(data)
);
$credentials = new Amazon.Runtime.BasicAWSCredentials($accessKey, $secretKey)

Write-S3Object `
    -Region Amazon.RegionEndpoint.USEast1 `
    -EndpointUrl $endpoint `
    -Credential $credentials `
    -ForcePathStyleAddressing true `
    -BucketName $bucket `
    -Key $key `
    -File $sourceFile
#!/bin/bash

export MMFT_URL="https://filestash.sc4.medallia.com"
export MMFT_BUCKET="..."
export MMFT_ACCESS_KEY="..."
export MMFT_SECRET_KEY="..."

export AUDIO_FILE="audio_recording.mp4"
export CONTENT_TYPE="application/octet-stream"
export NOW=$(date -R)

export METHOD="PUT"
export RESOURCE_PATH="/${MMFT_BUCKET}/${AUDIO_FILE}"

export PLAINTEXT="${METHOD}\n\n${CONTENT_TYPE}\n${NOW}\n${RESOURCE_PATH}"
export SIGNATURE=$(echo -en ${PLAINTEXT} | openssl sha1 -hmac ${MMFT_SECRET_KEY} -binary | base64)

curl \
    "${MMFT_URL}/${RESOURCE_PATH}" \
    -X ${METHOD} \
    -T "${AUDIO_FILE}" \
    -H "Date: ${date}" \
    -H "Content-Type: ${CONTENT_TYPE}" \
    -H "Authorization: AWS ${MMFT_ACCESS_KEY}:${SIGNATURE}"

The key used with the S3 PutObject call should also be used for the speech_file_name parameter in the metadata in a later step.

🚧

Organize files in your MMFT bucket with a readiness towards future expansions by adding prefixes to your keys. Consider natural organization levels, such as by recording capture source or contact center or division/group. Multiple levels may also be employed.

Example: division/contact-center/source/recording.flac

❗️

Amazon recommends using a multipart upload for transferring large files (~100+ MB) to an S3 endpoint. At the time of this writing, the AWS SDK for Java v1 contained a TransferManager to assist with this; however, the AWS SDK for Java v2 has not yet implemented this functionality. If you notice upload problems, we recommend either downgrading to the v1 library or implementing multipart uploads yourself using the v2 library.

Medallia Speech API

The Medallia Speech API ingests metadata associated with the recording files uploaded to MMFT and initiates the processing of those files through Medallia’s transcription and analytics engines.

Multiple records may be published in a single request. The processing is asynchronous, but the initial job status is returned by the API. A job is in one of three states: ACCEPTED, PARTIALLY_ACCEPTED, or REJECTED. An accepted or rejected status indicates that the status applies to all records; a partially accepted status returns a list of the individual records from the request along with further details.

There are four required metadata parameters and many other optional ones. The following example shows the required parameters:

public class SpeechRecordMetadata {

    @JsonProperty("call_identifier")
    private String callIdentifier;

    @JsonProperty("speech_file_name")
    private String speechFileName;

    @JsonProperty("unit_identifier")
    private String unitIdentifier;

    @JsonProperty("call_date_and_time")
    private String callDateAndTime;

    ...

}

Full details, including the optional parameters, can be found at the Speech API reference page.

Publishing to the Speech API is then a simple HTTP POST:

public SpeechPublishResults publish(
        final List<SpeechRecordMetadata> page,
        final MecApiOptions mecApi
) {
    final WebClient webClient = ...;

    return webClient
        // Build the request
        .post()
        .uri(mecApi.getApiEndpointUrl())
        .accept(MediaType.APPLICATION_JSON)
        .contentType(MediaType.APPLICATION_JSON)
        .bodyValue(page)
        // Get the results from the Medallia Speech API
        .retrieve()
        .bodyToMono(SpeechPublishResults.class)
        // Retry if needed
        .retryWhen(Retry.backoff(
            RETRY_MAX_ATTEMPTS,
            RETRY_BACKOFF_MSECS
        ))
        // Block until the call is done
        .block();
}
# Note: The below example is optimized for PowerShell 7.1.

# Previous versions of PowerShell may not handle the conversion
# of $speechMetadata to the JSON form appropriately.  Users are
# encouraged to upgrade to PowerShell 7.1.

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

$clientId = "..."
$clientSecret = "..."
$tokenUrl = "https://instance.medallia.com/oauth/tenant/token"

$apiGatewayUrl = "https://instance-tenant.apis.medallia.com"

$speechMetadata = @(
		[PSCustomObject]@{
				'call_identifier' = "..."
				'speech_file_name' = "..."
				'unit_identifier' = "..."
				'call_date_and_time' = "..."
				...
		}
		...
)

$oauthCredentialsAsBasicAuthValue = [Convert]::ToBase64String( `
		[System.Text.Encoding]::UTF8.GetBytes( `
    		"$($clientId):$($clientSecret)" `
		) `
)

$oauthRequestHeaders = @{
		'Content-Type' = 'application/x-www-form-urlencoded'
		'Authorization' = "Basic $oauthCredentialsAsBasicAuthValue"
}

$oauthRequestBody = @{
		'grant_type' = 'client_credentials'
}

$oauthRawResponse = Invoke-WebRequest `
		$tokenUrl `
		-Method Post `
		-Headers $oauthRequestHeaders `
		-Body $oauthRequestBody `
		-Verbose

$oauthResponse = $oauthRawResponse.Content | ConvertFrom-Json

$token = $oauthResponse.access_token

$apiHeaders = @{
		'Authorization' = "Bearer $token"
}

Invoke-WebRequest `
		-Method Post `
		-Uri "$apiGatewayUrl/speech/v0/bulk-ingest" `
		-Headers $apiHeaders `
		-ContentType "application/json" `
		-Body $speechMetadata
curl \
    "${API_GATEWAY}/speech/v0/bulk-ingest" \
    -X PUT \
    -H "Authorization: Bearer ${TOKEN}" \
    -H "Content-Type: application/json" \
    --data-binary '[
        {
            "call_identifier": "1182293942",
            "speech_file_name": "audio_recording.mp4",
            "unit_identifier": "wm_advisor_1",
            "call_date_and_time": "2021-03-24T13:02:00Z",
            "call_recording_url": "",
            "vertical_model": "Call Center",
            "locale": "en-US",
            "apply_diarization": "No",
            "agent_channel": "0",
            "substitutions": "{\"appeal box\":\"a PO box\",\"triple A batteries\":\"AAA batteries\"}",
            "apply_redaction": "no",
            "first_name": "Jane",
            "last_name": "Perry",
            "email": "[email protected]",
            "phone_number": "555-555-5555"
        }
    ]'

Conclusion

This guide explained how to import data to Medallia Speech through the Medallia Media File Transfer service and the Speech API.

Medallia has published a Java-based reference implementation at medallia/speech-api-reference-implementation. The reference implementation provides a command-line-based mechanism for accomplishing all the steps outlined in this document. Using it, you can transfer recording files to MMFT and publish metadata to Medallia Speech. Full source code and usage details are available.