After having hooked IAudioClient::GetCurrentPadding, IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer I implemented some logic for each hook.
Here I just copy the pointer to IAudioClient to be able to use it later.
In this hook, I copy the IAudioRenderClient pointer and the data pointer so I know where it’s stored.
This is where my main logic takes place. At this point our buffer is filled and we get the number of frames written. We know the buffer location from the previous call to IAudioRenderClient::GetBuffer and thus can copy its content.
Copying the buffer
My first attempt was very basic, yet it did work. I simply wrote the buffer to a file with a fixed size. I used Audacity’s ability to import raw audio and ended up with some trial and error approach until I got the correct audio format. Because using a hardcoded size and guessing the format was not an option, I had two new
problems challenges to solve.
Since we know the number of frames written, I figured the actual buffer size had something to do with the number of channels and the bits per sample. So the format thing again. Sounded like killing two birds with one stone. I was sure I was very close to getting it working properly.
Figuring out the audio format
A quick look at the audio API documentation releaved there’s a function called AudioClient::GetMixFormat. Now call me crazy but I somehow thought this would return the audio format used.
So I checked the structure it returned and added a function to output the format. Since it matched with the format I used in Audacity (IEEE_FLOAT, 2 channels, 44k) I changed the code responsible for writing the buffer to calculate the size based on the format. A quick test run and Audacity check made me smile: It worked.
I didn’t figure out the audio format
When I tried it on a different process though (Chrome), audio was distorted. One could clearly hear the original audio, but I wasn’t able to get it playing properly, no matter what parameters I used in Audacity. To narrow down the problem I used audio that would only use one speaker and tried to make sense of the output in a hex editor. I saw weird sequences of zeros I didn’t understand and went to SO to ask for help (note: Don’t read my own solution posted there yet).
Unfortunately it appeared no one was able to help me, so I gave it another shot. By further researching AudioClient::GetMixFormat it appeared it could be wrong, i.e. not reporting the format passed to IAudioClient::Initialize. I don’t know for sure why there is no way to retrieve the actual audio format, but I figured I had to read it from some internal structure. Hooking IAudioClient::Initialize was not an option, because I wanted to be able to attach to processes playing audio already.
Figuring out the audio format. Again
To be able to read the audio format from a IAudioClient instance, I fired up IDA and looked through the varios interface functions. However, I couldn’t find anything useful. Debugging the various audio calls and looking for known values (I used Spotify again since I knew its format) finally led to a result in IAudioRenderClient::GetBuffer though.
.text:10019145 154 mov eax, [ebx-50h] .text:10019148 154 movzx ecx, word ptr [eax+0ACh] .text:1001914F 154 mov eax, [eax+0A8h]
The values in ecx and eax looked familiar (block align and bit rate). So modying my code to the following utilized the correct audio format now:
mov eax, pAudioClient mov eax, [eax + 0x7C] mov eax, [eax + 0x3C] mov edi, eax mov eax, [edi - 50h] add eax, 0xA0 mov waveFormatEx, eax
At this point I was able to attach to an arbitrary process and record its audio without any other audio interfering. It still crashed the process at sometimes and as noted in the SO post only worked on Win 8, but I was happy with the outcome.
I hope you learned a few things about the audio API and how long it can take to properly research such undocumented stuff. The whole process described above was done on a couple of days over a timespan of 3 months, because I often got frustrated and decided to give it a shot another day.
Later on I also added some nice features like MP3 support to reduce the file size and made the library support IPC (using pipes) so I could record audio from sandboxed processes as well. I also went to add Win 7 support and the ability to mute audio of a process (you just zero the buffer) but that’s not really related to the core research.
Thanks for reading.