Vocal Tract Visualization and Imaging

Vocal tract visualization and imaging is the collection of procedures for performing a detailed visual examination of the vocal tract and laryngeal and velopharyngeal structures and gross function, including vocal fold vibration. These procedures enable a speech-language pathologist (SLP) to further assess and plan treatment strategies for

  • voice,
  • deglutition, and
  • resonance disorders.

These procedures use either a constant or a stroboscopic light source for indirect laryngoscopy, rigid fiberoptic oral endoscopy (RFOE), or flexible fiberoptic nasendoscopy (FFN). Images and/or videos can be made using any of these techniques and can be stored on digital media. Physicians are the only professionals qualified and licensed to render medical diagnoses related to the identification of laryngeal pathology as it affects voice. Imaging should be viewed and interpreted by an otolaryngologist with training in this procedure when used for medical diagnostic purposes. SLPs trained in stroboscopy view and interpret imaging for SLP diagnosis (e.g., dysphagia) and to establish/modify treatment plans. Videofluoroscopy, ultrasound, and video images can also be used to view all or part of the vocal tract and oral structures. However, this is not the focus of this page.

Please see ASHA’s resource on Flexible Endoscopic Evaluation of Swallowing (FEES) for further information on imaging for deglutition.

Instrumentation

Although there is typically some variation between procedures, an effort has been made to standardize protocols for instrumental assessment of voice, including recommendations for laryngeal endoscopic imaging (Patel et al., 2018).

Flexible Fiberoptic Nasendoscopy (FFN)

FFN is performed with a flexible nasendoscope inserted through the nasal passage. A fiberoptic bundle transmits high-intensity light to illuminate structures, which are then viewed and/or recorded. Distal-chip flexible endoscopes allow for assessment of vibratory motion similar to that of a rigid endoscope with stroboscopy (Patel, 2012). A nasendoscope with a smaller diameter may be used for pediatric populations.

Advantages

  • excellent image of the vocal folds and velopharyngeal structures during
    • voicing,
    • conversation,
    • singing,
    • eating/swallowing, and
    • rest breathing
  • potential for image recording and instant replay

Disadvantages

  • equipment expense
  • possible patient discomfort
  • possible stimulation of gag reflex

Please see ASHA’s resource on Flexible Endoscopic Evaluation of Swallowing (FEES) for related information.

Rigid Fiberoptic Oral Endoscopy (RFOE)

RFOE is performed with a rigid tube inserted into the oral or pharyngeal cavity. A prism optic system projects high-intensity light at a predetermined angle to illuminate the structures to be observed and recorded.

Advantages

  • high illumination
  • wide field of view
  • excellent image reproduction
  • smaller diameter rigid endoscopes are available for pediatric populations or those with a smaller oral cavity

Disadvantages

  • interference with normal speech production
  • minor patient discomfort
  • equipment expense
  • possible difficulties with gag reflex

Videolaryngoendoscopy (either RFOE or FFN)

Videolaryngoendoscopy is used to assess the following (Patel et al., 2018):

  • vocal fold mobility
  • vocal fold maximum range
  • vibratory characteristics of the vocal folds
  • vocal fold appearance
    • malposition
    • excrescence (abnormal projection/outgrowth)
    • edema
    • erythema
  • vocal fold edge appearance
    • smooth
    • straight
    • bowed
    • convex
    • concave
    • irregular
    • rough
  • subglottal appearance
    • erythema
    • edema
  • supraglottal behavior
    • medial compression
    • anterior–posterior compression
    • mild/moderate/severe
  • arytenoid movement
    • normal or impaired mobility
      • bilateral
      • unilateral
  • velopharynx
    • contact between the soft palate and the posterior pharyngeal wall as well as lateral pharyngeal wall movement with
      • sustained fricatives such as /s/,
      • syllable repetition,
      • multisyllabic words,
      • phrases with pressure-loaded consonants, and
      • sentence or spontaneous speech
  • secretions
    • amount
    • consistency

Videostroboscopy

Videostroboscopy is performed with either a flexible or a rigid endoscope combined with a strobe light correlated to vocal fold vibration via a laryngeal microphone. This combination permits vocal tract structures to be seen in an apparent “slow motion” format.

Advantages

  • extensive body of information relative to the effect of pathology on the process of voicing
  • potential for providing information about the neuromuscular and physiological integrity of the vocal folds and supraglottic structures

Disadvantages

  • patient discomfort related to the use of FFN or RFOE
  • image restricted to isolated vowel production when the strobe light is used
  • highly subjective (Roy et al., 2013)

Videostroboscopy is used to assess the following (Patel et al., 2018):

  • amplitude of excursion (lateral movement of the vocal fold medial plane)
    • symmetrical
    • normal/reduced/absent
    • each fold can be rated separately as a percentage
  • vertical level—level difference in the vertical plane between vocal folds during the maximum closed phase of the glottic cycle
    • on-plane
    • off-plane
  • periodicity of vocal fold movement
    • always/usually/sometimes/never periodic
    • segments of the vocal fold that are aperiodic
  • vocal fold mucosal wave (independent lateral movement of mucosa over the vocal fold)
    • normal/diminished/great/symmetrical/absent
  • glottal closure pattern—glottal configuration at maximum closure
    • complete
    • incomplete
      • posterior glottal gap
      • anterior glottal gap
      • hourglass
      • incomplete
      • irregular
      • spindle-shaped/bowing
  • phase closure—relative proportion of the glottal cycle in which the glottis is closed versus open
    • open phase
    • closed phase
  • vocal fold appearance
    • malposition
    • excrescence (abnormal projection/outgrowth)
    • edema
    • erythema
  • vocal fold edge appearance
    • smooth
    • straight
    • bowed
    • convex
    • concave
    • irregular
    • rough
  • subglottal appearance
    • erythema
    • edema
  • supraglottal behavior
    • medial compression
    • anterior–posterior compression
    • mild/moderate/severe
  • arytenoid movement
    • normal or impaired mobility
      • bilateral
      • unilateral
  • velopharynx
    • contact between the soft palate and the posterior pharyngeal wall as well as lateral pharyngeal wall movement with
      • sustained fricatives such as /s/,
      • syllable repetition,
      • multisyllabic words,
      • phrases with pressure-loaded consonants, and
      • sentence or spontaneous speech
  • secretions
    • amount
    • consistency

Interpretation

  • amplitude asymmetry—mass, compliance, neurogenic difference, scarring, granuloma
  • function of the velopharynx—degree of closure, context relevant behaviors
  • inadequate closure—intervening mass, neurogenic disorder (paralysis), hypofunctional disorder
  • mucosal wave adynamic segment—cover scarring, intracordal cyst, fibrosis, neurogenic disorder, edema
  • phase asymmetry—mass, compliance, neurogenic difference
  • supraglottic compression—hyperfunction, compensatory hyperfunction
  • voice quality abnormal, larynx normal—behavioral disorder

Roles and Responsibilities

For many clinicians, it will be necessary to seek training in visualization and imaging after completion of the requirements for the ASHA Certificate of Clinical Competence through intensive continuing education, pre-service, or in-service training programs. Education and training may vary for each of these procedures. The training and mentorship should take place in a clinical setting, allowing the professional to work with more experienced professionals and a number and variety of patients. Practitioners must determine if they have obtained a sufficient degree of education and training to be competent to perform vocal tract visualization and imaging. The safety of the patient is paramount when considering any procedure. Please see ASHA’s Vocal Tract Visualization and Imaging: Position Statement and ASHA’s States with Specific Instrumental Assessment Requirements for further information.

Precautions and Risks 

Before undertaking these procedures, practitioners consider the following precautions: 

  1. Check with state licensure board(s), where appropriate, to determine whether there are limitations on the scope of SLP practice that restrict the performance of these procedures.
  2. Follow universal precautions, including personal protective equipment (PPE) as appropriate, to prevent the risk of disease transmission from blood/airborne pathogens.
  3. Have immediate emergency medical assistance available when using topical anesthesia or FFN.
  4. Hold a current Basic Life Support Certificate if performing FFN or using topical anesthesia.
  5. Recommend that the patient remains NPO until anesthetic wears off. 

Practitioners also educate patients on risks associated with imaging, obtain the patient's informed consent, and maintain documentation when performing FFN or when using topical anesthesia. Risks may include the following:

  1. vasovagal response
  2. adverse/allergic reaction to topical anesthesia
  3. nasal irritation 

Anatomical Structures, Adult

Laryngeal structures—closed vocal folds

Aryepiglottic fold—composed of the mucous membrane, not typically used in voice production (Figure 2-4)

Corniculate cartilage—paired cartilaginous structures that sit atop the arytenoid cartilage, not directly implicated in voice production (Figure 2-4)

Cuneiform cartilage—cartilage embedded in the aryepiglottic muscle/fold that serves as a supportive framework for the larynx (Figure 2-3)

Epiglottis—cartilage covered with a mucous membrane, does not serve a function in voice production (Figures 2-3 and 2-4)

Esophageal sphincter—a muscular ring that opens into the esophagus, does not serve a function in typical voice production (Figures 2-3 and 2-4)

Posterior pharyngeal wall—the muscular wall of the posterior pharynx used in swallowing, not used in voice production (Figure 2-4)

Tracheal rings—cartilaginous rings of the trachea, do not serve a function in voice production (Figure 2-3)

True vocal folds—muscularized mucous membranes used for sound production (Figures 2-3 and 2-4)

Ventricular folds—ligaments covered by a mucous membrane that lie superior to the true vocal folds, also called “false vocal folds” (Figure 2-4)

ASHA Resources

References

Patel, R. R. (2012). Updates on endoscopic laryngeal imaging. Perspectives on Voice and Voice Disorders, 22(2), 64–71. https://doi.org/10.1044/vvd22.2.64

Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., Paul, D., Švec, J. G., & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. American Journal of Speech-Language Pathology, 27(3), 887–905. https://doi.org/10.1044/2018_AJSLP-17-0009

Roy, N., Barkmeier-Kraemer, J., Eadie, T., Sivasankar, M. P., Mehta, D., Paul, D., & Hillman, R. (2013). Evidence-based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, 22(2), 212–226. https://doi.org/10.1044/1058-0360(2012/12-0014)

ASHA Corporate Partners