Yesterday, 21:06
Ok, so there are a couple of minor challenges here.
Automation
First of all, if you have images and audio files in the same dir, with matching file names, you could easily create a caption template. For example if you have image name ABC123.jpg, there should be audio file audio_ABC123.jpg (or a similar pattern that allows us to match image file name with audio file names).
With the above, we can go to page settings > gallery > captions > caption defaults > default image description and create a template-based description, linking to each image's audio `{path}/audio_{file_name}.mp3`.
How to play the audio
Did you put any idea into how you want the audio to play? Just a "play" button or some text link "Play commentary"? The easy solution here, would be to simply open the audio in a new window or popup. However, I am guessing you want a more streamlined solution, where the audio is simply played in the background on X3 website? This would require some javascript: 1. create a global "hidden" audioplayer on the page, 2. Have links for each image that trigger and play specific audio files, in the global hidden audioplayer on the page.
Then of course it could get more complicated, if you want each button/link at each image to be able to start/stop the audio, and/or if you want the button to display the current play state of the current commentary (eg it changes changes text or color while playing, and reverts when complete).