Recording audio of a single process: Part III

After having hooked IAudioClient::GetCurrentPadding, IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer I implemented some logic for each hook.

IAudioClient::GetCurrentPadding

Here I just copy the pointer to IAudioClient to be able to use it later.

IAudioRenderClient::GetBuffer

In this hook, I copy the IAudioRenderClient pointer and the data pointer so I know where it’s stored.

IAudioRenderClient::ReleaseBuffer

This is where my main logic takes place. At this point our buffer is filled and we get the number of frames written. We know the buffer location from the previous call to IAudioRenderClient::GetBuffer and thus can copy its content.

Copying the buffer

My first attempt was very basic, yet it did work. I simply wrote the buffer to a file with a fixed size. I used Audacity’s ability to import raw audio and ended up with some trial and error approach until I got the correct audio format. Because using a hardcoded size and guessing the format was not an option, I had two new problems challenges to solve.

Since we know the number of frames written, I figured the actual buffer size had something to do with the number of channels and the bits per sample. So the format thing again. Sounded like killing two birds with one stone. I was sure I was very close to getting it working properly.

Figuring out the audio format

A quick look at the audio API documentation releaved there’s a function called AudioClient::GetMixFormat. Now call me crazy but I somehow thought this would return the audio format used.

So I checked the structure it returned and added a function to output the format. Since it matched with the format I used in Audacity (IEEE_FLOAT, 2 channels, 44k) I changed the code responsible for writing the buffer to calculate the size based on the format. A quick test run and Audacity check made me smile: It worked.

I didn’t figure out the audio format

When I tried it on a different process though (Chrome), audio was distorted. One could clearly hear the original audio, but I wasn’t able to get it playing properly, no matter what parameters I used in Audacity. To narrow down the problem I used audio that would only use one speaker and tried to make sense of the output in a hex editor. I saw weird sequences of zeros I didn’t understand and went to SO to ask for help (note: Don’t read my own solution posted there yet).

Unfortunately it appeared no one was able to help me, so I gave it another shot. By further researching AudioClient::GetMixFormat it appeared it could be wrong, i.e. not reporting the format passed to IAudioClient::Initialize. I don’t know for sure why there is no way to retrieve the actual audio format, but I figured I had to read it from some internal structure. Hooking IAudioClient::Initialize was not an option, because I wanted to be able to attach to processes playing audio already.

Figuring out the audio format. Again

To be able to read the audio format from a IAudioClient instance, I fired up IDA and looked through the varios interface functions. However, I couldn’t find anything useful. Debugging the various audio calls and looking for known values (I used Spotify again since I knew its format) finally led to a result in IAudioRenderClient::GetBuffer though.


.text:10019145 154                 mov     eax, [ebx-50h]
.text:10019148 154                 movzx   ecx, word ptr [eax+0ACh]
.text:1001914F 154                 mov     eax, [eax+0A8h]

The values in ecx and eax looked familiar (block align and bit rate).  So modying my code to the following utilized the correct audio format now:

mov     eax, pAudioClient
mov     eax, [eax + 0x7C]
mov     eax, [eax + 0x3C]
mov     edi, eax
mov     eax, [edi - 50h]
add     eax, 0xA0
mov     waveFormatEx, eax

At this point I was able to attach to an arbitrary process and record its audio without any other audio interfering. It still crashed the process at sometimes and as noted in the SO post only worked on Win 8, but I was happy with the outcome.

Conclusion

I hope you learned a few things about the audio API and how long it can take to properly research such undocumented stuff. The whole process described above was done on a couple of days over a timespan of 3 months, because I often got frustrated and decided to give it a shot another day.

Later on I also added some nice features like MP3 support to reduce the file size and made the library support IPC (using pipes) so I could record audio from sandboxed processes as well. I also went to add Win 7 support and the ability to mute audio of a process (you just zero the buffer) but that’s not really related to the core research.

Thanks for reading.

16 thoughts on “Recording audio of a single process: Part III

  1. John

    I want to capture single app audio. But I can not get right audio frame size. And I see you met same problem and you have fix it in windows 8.

    It is not unhappy that I used windows 7. So Did you fix it on windows 7 yet? Thanks a lot for your response.

    Reply
      1. John

        Hi Lms,

        I has tried it on windows 7. And it seems that it can not get real client audio format used in audio release buffer data.
        I just use it to get firefox audio player frame size. It get wBitsPerSample is 32, but in fact wBitsPerSample should be 16.
        Any idea? thanks a lot.

        Reply
        1. LMS Post author

          This is the way I did it and it works fine for all processes I tried it on, so I don’t really know. Are you sure it’s 16 though? IIrc Firefox was 32bit floating.

          Reply
          1. John

            When I copy the audio buffer using frame size is 8 (channels 2, samples bytes 32/8=4, so frame size = 2*4=8), firefox will crash. And When I change frame size into 4, the audio buffer can be got well. And it listen well too.

            So I guess that wBitsPerSample should be 16.

            Thanks your response.

          2. John

            I make a mistake. I use my IAudioRenderClient to get format and do not use releasebuffer IAudioRenderClient. It works well now.
            Actually wBitsPerSample in firefox is 16 bits, maybe I used 32 firefox process which is different with yours.

            You are great, I tried used same way to get the format, but I am poor experience on reverse codes…

            Do you have simple description on how to get format on windows 8 and windows 10 from release buffer? I find your description on how to get format on windows 8 is according to pAudioClient .Maybe it should be got from release buffer IAudioRenderClient too.

            Thanks a lot for your help.

          3. John

            Hi LMS,

            I tried same way to get format in windows 8 like in windows 7. It works when get wmplayer audio!
            But injecting some process such as Firefox will crash.

          4. LMS Post author

            I never tested it on Windows 10, but chances are they changed the internal layout again. You’ll have to reverse engineer the audio dll to find out what changed.

          5. MrGilbert

            LMS, thanks for the greate writeup. I really enjoyed reading your findings.
            Too bad John didn’t share his solution for getting the pointer to Windows 10s WaveFormatExtensible instance in AudioRenderClient. :\

            I’d like to figure it out by myself, but I’m a bloody IDA newbie. I know about COM-Interfaces, hooking and stuff, and I even have a working implementation. But I’m also running into the issue where GetMixFormat returns the “wrong” information. I assume I would have to do the reverse engineering for every version of Windows I’d like to support, wouldn’t I? In theory, of course.

            At the end, it might be easier to simply ask the user to let my program restart the target application… Or, my program could restart it on it’s own. Hm…

          6. LMS Post author

            Yeah, I’d like to know that as well, as I never researched it on Win 10. But you are correct, due to it being retrieved from an internal/non-exposed structure offset, you would have to locate it for every Windows version, in some cases possibly even for different DLL versions on the same OS. Restarting the target application doesn’t sound too bad in my opinion 🙂

  2. braindamage

    can u explain the buffersize? Currently developing basically the same application. I just have trouble with copying the buffer before releasebuffer gets called (hooks are working)

    its sizeof(unsigned char) * nBlockAlign * NumFramesWritten ? or u did say nChannels *wBitsPerSample / 4 (Thought its the same as nBlockAlign)

    Did u find out what happens if GetBuffer occurs 2 times and then releaseBuffer get called. What order does Windows use? Also why don’t u just call GetMixFormat when u have got the IAudioClient ? Its easier than ur asm and Version independent.

    Sorry for my bad english. Its late right now

    Reply
    1. LMS Post author

      The size should be NumFramesWritten * nChannels * (wBitsPerSample / 8);

      I don’t think I encountered GetBuffer called two times before ReleaseBuffer was called, but you may wanna check in GetBuffer if the previous buffer was filled already and then copy it here already and append the new buffer in ReleaseBuffer then. As for GetMixFormat, as I wrote in my post, it tends to be wrong. That’s why I had to rely on some asm logic.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *