After having hooked IAudioClient::GetCurrentPadding, IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer I implemented some logic for each hook.
Here I just copy the pointer to IAudioClient to be able to use it later.
In this hook, I copy the IAudioRenderClient pointer and the data pointer so I know where it’s stored.
This is where my main logic takes place. At this point our buffer is filled and we get the number of frames written. We know the buffer location from the previous call to IAudioRenderClient::GetBuffer and thus can copy its content.
Copying the buffer
My first attempt was very basic, yet it did work. I simply wrote the buffer to a file with a fixed size. I used Audacity’s ability to import raw audio and ended up with some trial and error approach until I got the correct audio format. Because using a hardcoded size and guessing the format was not an option, I had two new
problems challenges to solve.
Since we know the number of frames written, I figured the actual buffer size had something to do with the number of channels and the bits per sample. So the format thing again. Sounded like killing two birds with one stone. I was sure I was very close to getting it working properly.
Figuring out the audio format
A quick look at the audio API documentation releaved there’s a function called AudioClient::GetMixFormat. Now call me crazy but I somehow thought this would return the audio format used.
So I checked the structure it returned and added a function to output the format. Since it matched with the format I used in Audacity (IEEE_FLOAT, 2 channels, 44k) I changed the code responsible for writing the buffer to calculate the size based on the format. A quick test run and Audacity check made me smile: It worked.
I didn’t figure out the audio format
When I tried it on a different process though (Chrome), audio was distorted. One could clearly hear the original audio, but I wasn’t able to get it playing properly, no matter what parameters I used in Audacity. To narrow down the problem I used audio that would only use one speaker and tried to make sense of the output in a hex editor. I saw weird sequences of zeros I didn’t understand and went to SO to ask for help (note: Don’t read my own solution posted there yet).
Unfortunately it appeared no one was able to help me, so I gave it another shot. By further researching AudioClient::GetMixFormat it appeared it could be wrong, i.e. not reporting the format passed to IAudioClient::Initialize. I don’t know for sure why there is no way to retrieve the actual audio format, but I figured I had to read it from some internal structure. Hooking IAudioClient::Initialize was not an option, because I wanted to be able to attach to processes playing audio already.
Figuring out the audio format. Again
To be able to read the audio format from a IAudioClient instance, I fired up IDA and looked through the varios interface functions. However, I couldn’t find anything useful. Debugging the various audio calls and looking for known values (I used Spotify again since I knew its format) finally led to a result in IAudioRenderClient::GetBuffer though.
.text:10019145 154 mov eax, [ebx-50h] .text:10019148 154 movzx ecx, word ptr [eax+0ACh] .text:1001914F 154 mov eax, [eax+0A8h]
The values in ecx and eax looked familiar (block align and bit rate). So modying my code to the following utilized the correct audio format now:
mov eax, pAudioClient mov eax, [eax + 0x7C] mov eax, [eax + 0x3C] mov edi, eax mov eax, [edi - 50h] add eax, 0xA0 mov waveFormatEx, eax
At this point I was able to attach to an arbitrary process and record its audio without any other audio interfering. It still crashed the process at sometimes and as noted in the SO post only worked on Win 8, but I was happy with the outcome.
I hope you learned a few things about the audio API and how long it can take to properly research such undocumented stuff. The whole process described above was done on a couple of days over a timespan of 3 months, because I often got frustrated and decided to give it a shot another day.
Later on I also added some nice features like MP3 support to reduce the file size and made the library support IPC (using pipes) so I could record audio from sandboxed processes as well. I also went to add Win 7 support and the ability to mute audio of a process (you just zero the buffer) but that’s not really related to the core research.
Thanks for reading.
I want to capture single app audio. But I can not get right audio frame size. And I see you met same problem and you have fix it in windows 8.
It is not unhappy that I used windows 7. So Did you fix it on windows 7 yet? Thanks a lot for your response.
Hi John, I did indeed solve it for Windows 7 (x86) as well. Try this: http://pastebin.com/TpV1xagu
I has tried it on windows 7. And it seems that it can not get real client audio format used in audio release buffer data.
I just use it to get firefox audio player frame size. It get wBitsPerSample is 32, but in fact wBitsPerSample should be 16.
Any idea? thanks a lot.
This is the way I did it and it works fine for all processes I tried it on, so I don’t really know. Are you sure it’s 16 though? IIrc Firefox was 32bit floating.
When I copy the audio buffer using frame size is 8 (channels 2, samples bytes 32/8=4, so frame size = 2*4=8), firefox will crash. And When I change frame size into 4, the audio buffer can be got well. And it listen well too.
So I guess that wBitsPerSample should be 16.
Thanks your response.
I make a mistake. I use my IAudioRenderClient to get format and do not use releasebuffer IAudioRenderClient. It works well now.
Actually wBitsPerSample in firefox is 16 bits, maybe I used 32 firefox process which is different with yours.
You are great, I tried used same way to get the format, but I am poor experience on reverse codes…
Do you have simple description on how to get format on windows 8 and windows 10 from release buffer? I find your description on how to get format on windows 8 is according to pAudioClient .Maybe it should be got from release buffer IAudioRenderClient too.
Thanks a lot for your help.
I tried same way to get format in windows 8 like in windows 7. It works when get wmplayer audio!
But injecting some process such as Firefox will crash.
It is not work on Windows 10.
I never tested it on Windows 10, but chances are they changed the internal layout again. You’ll have to reverse engineer the audio dll to find out what changed.
I have finished reverse work on windows 10. It looks well now.
Cool, would you mind sharing it?
LMS, thanks for the greate writeup. I really enjoyed reading your findings.
Too bad John didn’t share his solution for getting the pointer to Windows 10s WaveFormatExtensible instance in AudioRenderClient. :\
I’d like to figure it out by myself, but I’m a bloody IDA newbie. I know about COM-Interfaces, hooking and stuff, and I even have a working implementation. But I’m also running into the issue where GetMixFormat returns the “wrong” information. I assume I would have to do the reverse engineering for every version of Windows I’d like to support, wouldn’t I? In theory, of course.
At the end, it might be easier to simply ask the user to let my program restart the target application… Or, my program could restart it on it’s own. Hm…
Yeah, I’d like to know that as well, as I never researched it on Win 10. But you are correct, due to it being retrieved from an internal/non-exposed structure offset, you would have to locate it for every Windows version, in some cases possibly even for different DLL versions on the same OS. Restarting the target application doesn’t sound too bad in my opinion 🙂
can u explain the buffersize? Currently developing basically the same application. I just have trouble with copying the buffer before releasebuffer gets called (hooks are working)
its sizeof(unsigned char) * nBlockAlign * NumFramesWritten ? or u did say nChannels *wBitsPerSample / 4 (Thought its the same as nBlockAlign)
Did u find out what happens if GetBuffer occurs 2 times and then releaseBuffer get called. What order does Windows use? Also why don’t u just call GetMixFormat when u have got the IAudioClient ? Its easier than ur asm and Version independent.
Sorry for my bad english. Its late right now
The size should be NumFramesWritten * nChannels * (wBitsPerSample / 8);
I don’t think I encountered GetBuffer called two times before ReleaseBuffer was called, but you may wanna check in GetBuffer if the previous buffer was filled already and then copy it here already and append the new buffer in ReleaseBuffer then. As for GetMixFormat, as I wrote in my post, it tends to be wrong. That’s why I had to rely on some asm logic.
Hi! What about the waveFormatEx fix for windows 10?
I know it’s already over 5 years since you wrote your comment, but i want to answer you anyways, because this opportunity has kept me awake for days.
I can proudly present you a working solution! (It works on my machine xD)
With the following steps you can get the offset of the pointer to the internal WaveFormatEx struct in the IAudioClient class:
1. Create a “dummy” IAudioClient and Initialize it with any working format (e.g. retrieved via ::GetMixFormat before calling ::Initialize)
2. Now search the process memory for every occurence of the set format (For my tests I used CheatEngine. When I knew it worked, i wrote my own implementation) and store their addresses as an uintptr_t.
3. Interpret the pointer to your “dummy” IAudioClient as an uintptr_t* and check if the value it points to occures in your stored addresses. If there’s no match, increment the pointer by one (byte). Do that until you find a match.
This works for both x86 and x64.
Because you create a “dummy” IAudioClient, you don’t need to hardcode your offsets. And there should be no problem with new versions of the Windows SDK.
It should work on Win7 and Win8 too. (I tested it on Win10 only)
At the time of writing the offset of the pointer to the internal WaveFormatEx struct is 904 for x64 and 660 for x86.
Here an example code:
uint64_t findWaveFormatExPtrOffset(IAudioClient* pAudioClient)
// Add all occurences of *pwfx to toCheck
char* pRawAudioClient = (char*)pAudioClient;
uint64_t offset = 0;
while (offset < 1024) // The highest offset I got was 904, i never got any segfault errors. This value should maybe be tweaked
uintptr_t* curr = (uintptr_t*)(pRawAudioClient + offset);
if (toCheck.find(*curr) != toCheck.end())
I use this code in a personal project. You can view it on GitHub if you want to: https://github.com/TecStylos/Carl.git
Thank you for this post!
Here’s the current patterns and offsets for Win10:
.text:10033BE1 mov eax, [edi+54h]
.text:10033BE4 lea edx, [edi+5Ch]
.text:10033BE7 movzx esi, word ptr [eax+0Ch]
.text:10033BEB imul esi, ebx
private const string X86WaveFormatPattern = “8B 47 ?? 8D 57 ?? 0F B7 70 ??”;
.text:0000000180020177 mov rax, [rbx+90h]
.text:000000018002017E movzx edi, word ptr [rax+0Ch]
.text:0000000180020182 mov rax, [rbx+0A0h]
.text:0000000180020189 imul edi, esi
private const string X64WaveFormatPattern = “48 8B 83 ?? ?? ?? ?? 0F B7 78 ??”;