Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Apple Neural Engine Internal: From ML Algorithm to HW Registers (blackhat.com)
126 points by blopeur on Dec 9, 2021 | hide | past | favorite | 63 comments



Does anything actually use the ANE? I've had asitop on vanilla M1 running and have literally never seen anything use it.

This may be a dumb question, but would the space not be better used for more GPU cores? Then if an app actually does need to do neural net stuff, just use the GPU anyway? This way you wouldn't have the space the ANE uses idle 99.9% of the time.


> asitop

Neat little tool

> Does anything actually use the ANE?

I poked around for a few minutes to see if I could get anything. I saw a blip of usage on in Photos when adding a photo, probably for the new text recognition feature. I was able to see a lot of usage in Photo Booth, I guess for face detection for effects. There's definitely third-party apps making use of Core ML, but it's not something I know much about. On the iPhone devices, it's also used for Face ID (as the article mentions), which is probably much more important to Apple.

> This may be a dumb question, but would the space not be better used for more GPU cores? Then if an app actually does need to do neural net stuff, just use the GPU anyway?

The ANE is likely going to be more power-efficient than GPU cores loaded up for similar processing. Could they use the space for more GPU cores? Maybe, but they would have to work it into their memory architecture, support the additional power requirements, and presumably handle other engineering challenges like thermal design. Including the ANE also gives them something else to put in their marketing materials lol.


Dedicated neural net silicon is faster than GPU, at the cost of being less general purpose.

NVIDIA GPUs for example dedicate a modest fraction of their die to Tensor cores, at the cost of less general-purpose CUDA cores.


Thanks. I didn't realise how much faster. Probably you can do 10x the FLOPs in similar die space on dedicated silicion vs GPU. I thought it would be faster, but not an order of magnitude more.


ANE is used for the portrait video effect (blurred background) and other camera features.


Oof. Clicking this link got me blacklisted from my router. Had to spoof my MAC address to reconnect.


Sounds like you need DoH in your browser too?


What?


Overzealous "protection" filtering presumably (guessing from the Blackhat name). Happens on corporate VPNs too. We can't have nice things.


The router I'm connecting with is this Meraki thing I don't manage. It has some security features. Upon opening a link to blackhat.com, it put my MAC address on a blacklist.


That is so completely bass-ackwards it’s not even comprehensible. If the router knows you’re trying to connect to a malicious site, it should block your access to that site. Taking action against your device in this scenario doesn’t serve any purpose at all. I’m going to assume that this router is incredibly misconfigured, and that’s not a default behavior.


Routers are stupid (and the engineering around them is often incompetent), here's a random incidental example of how they used to make life hard for Chrome 10 years ago: https://bugs.chromium.org/p/chromium/issues/detail?id=12066#...


I think the logic is that if you're browsing blackhat.com then you are a malicious user, going there to download hacking tools or attacking the network.


Sounds like you should put the manufacturer on a blacklist.


> I just did byte flipping on the ANEProgram, and an iOS kernel OOB read was issued.

This does not give us a lot of confidence.


geohot reverse engineered ANE last year, for his tinygrad project, https://github.com/geohot/tinygrad He streamed the many-hours efforts on Youtube, in case this is your thing: https://m.youtube.com/watch?v=mwmke957ki4


Fascinating in a sense. All of this knowledge exists somewhere, but people are building their careers trying to understand what Apple built so that they can better understand what apple forgot.


[flagged]


The M1 Max runs also at a fraction of the 3080 TDP.

Performance in gaming is a non issue. Apple didn’t advertise it as a gaming machine, and quite honestly, nobody buys a $3000+ MacBook to play games.


I bought my M1 Pro not for the sole purpose of being a gaming rig, but I did certainly buy it with the hopes that it could play games at least better than my old Intel integrated graphics laptop did.

And it does - I can run the PCSX2 PS2 emulator with a controller and my fans don't even spin up. And with Rosetta II there are plenty of Mac games I can play without any issues.


Yep. I'm running the Dolphin emulator which runs perfectly, even rendering at a much higher resolution. Except Apple broke Wiimote connecting via the native Bluetooth with Monterey. :(


What do you intend to do when they remove Rosetta?


Pray that there are more natively ported games by then, and just not updating and holding onto Rosetta for much longer than I should. Win11ARM in Parallels doing x86 emulation works pretty well for older games and its how I play old shooters and Everquest at the moment.


Use a Windows VM? x86(_64)->ARM64 is not going away on Windows anytime soon.


Performance is worse than a 3060 with basically the same tdp.

Let's not pretend it's about games. Most workloads see 3060 level performance, basically anything except editing video in prores and ML workloads that require a lot of memory.


How so? The M1 Max TDP tops at 90W, and runs at around 40W during benchmarking. The minimum configuration for the RTX 3060 Mobile runs at 60W, and the maximum, at 115W, both for the GPU alone. You would need to pair it with a CPU with a max TDP of 30W.


The M1 Max uses 120W when stressing GPU and CPU. The GPU by itself uses around 60W.

These are sustained, not peak, figures. The M1 Max GPU does use 60W and so do many 3060 Mobiles with worse overall performance, at almost three times the price.

https://www.anandtech.com/show/17024/apple-m1-max-performanc...


> The M1 Max uses 120W when stressing GPU and CPU. The GPU by itself uses around 60W.

The article clearly states that it is 120W on the wall. The SoC goes up to 92W. Nowhere it says that the GPU itself uses 60W.

You have also glossed over the fact that the article asserts that the MacBook runs at half of the wattage of an Intel system with a RTX 3080, 120W v. 256W.

> The M1 Max GPU does use 60W and so do many 3060 Mobiles with worse overall performance

A 3060 runs on 60W at the very low end. Usual configurations are 80W and 115W, and the differences in performance between those two are quite noticeable [1].

And again, you seem to focus on gaming, which is fine, but not the target market of a MacBook Pro. Notice that all the games tested are not native, but running on top of Rosetta 2. The M1 Max does great when it comes to video [2], and I would say the same for other workloads [3] [4] that have been, at least historically, where Apple has mostly focused on.

> at almost three times the price

Citation needed. Three times the price of what? The Intel system used in the Anandtech article is $3,149.00 [5], is the MacBook Pro $9,000?

By the way, Nvidia also has pro versions of their GPUs, they don't perform as well in gaming as they do in other tasks, and they really are way more expensive than their consumer counterparts. Are we now going to dismiss a Quadro RTX because it does not do too well running Tomb Raider?

[1] https://www.tomshardware.com/news/rtx3060mobile-multiple-tdp...

[2] https://www.anandtech.com/show/17024/apple-m1-max-performanc...

[3] https://blog.yiningkarlli.com/2021/10/takua-on-m1-max.html

[4] https://austinmann.com/trek/macbook-pro-m1-max-for-photograp...

[5] https://www.adorama.com/msige7611054.html


No, it doesn't use 120W at the wall. 120W is the difference between wall consumption in the test and at idle, which is around 1.1x more than the actual consumption. Apple under-reports power usage for some reason.

Sure, Apple runs at half the wattage of the 3080, with less than half of the performance.

All of the video workloads are specifically using ProRes, which is the Apple accelerated codec. If you do not use ProRes, you will get a fraction of the performance. The Macbook only gets that performance in a fraction of video editing workloads.

Yes, all games are running on Rosetta 2, except for Dolphin which shows a similar performance ratio. This doesn't matter because GPUs don't have to deal with Rosetta, and the tests are done in GPU-limited circumstances. The performance impact is going to be negligible in a CPU-limited scenario where the CPU is not even running at 100% to be able to slow down the GPU.

The M1 Max is around the speed of the slowest 3060 in all compute workloads that aren't bottlenecked by CPU-GPU communication.

The fact remains, Apple straight up lied by saying that their GPU is comparable to a 3080. It is comparable to a 3060 in TDP and slower in all except specific, hardware-optimized workloads. There are other workloads that are accelerated in the 3060 where the inverse is true.

Also, a Quadro RTX does very well at Tomb Raider. Which isn't surprising because it has a lot of compute power, unlike the M1 Max's GPU, and games nowadays care a lot about compute power.

The benchmark between a 3060 and the M1 Max was not by Anandtech. It was by LTT, and was with a laptop whose processor and GPU are in the 1000$ class. And unlike Anandtech, they did all of their GPU benchmarks in GPU-limited conditions whereas the Anandtech benchmarks are CPU-limited too, artificially boosting the performance of the M1 Max by measuring it's CPU as much as it's GPU.

I don't understand why Apple gets to call their GPU as fast as a 3080 when it's slower than a 3060 unless you are running very specific hardare-optimized software that require a hardware-specific worfklow.

All of the benchmarks you linked are CPU bechmarks, and the Takua render benchmark is ridiculous because in the real world, artists use GPU rendering on their laptops, where the M1 Max gets absolutely destroyed by an RTX 3060 because of hardware-accelerated ray tracing.


> No, it doesn't use 120W at the wall.

It literally says so in the Anandtech article.

> Apple under-reports power usage for some reason.

Again, citation needed. Mostly because these numbers aren't coming from Apple, but Anandtech.

> Sure, Apple runs at half the wattage of the 3080, with less than half of the performance.

But it is not half of the performance, that is a lie and the numbers are out there. Again, in the same article, the only downside is gaming, which runs under Rosetta 2.

> It is comparable to a 3060 in TDP and slower in all except specific, hardware-optimized workloads. There are other workloads that are accelerated in the 3060 where the inverse is true.

This is straight up false. Again, read the article you yourself linked.

> All of the benchmarks you linked are CPU bechmarks

Wow. So, rendering and manipulating video and photo are purely CPU benchmarks. Ridiculous.

I'm not going to continue with this. I find insane that you would willingly ignore simple arithmetics just to, I guess, bash Apple. I'm not the greatest Apple fan myself, but for sure I am open to reckon that they did extremely well with the M1 family, and that they have created a very good laptop for professionals.


The Anandtech article shows the difference between idle wall power and current wall power, not wall power. Read the article, please.

>Again, citation needed. Mostly because these numbers aren't coming from Apple, but Anandtech.

The grey numbers are reported by the CPU using Apple code, but do not reflect actual power consumption

>This is straight up false. Again, read the article you yourself linked.

Incorrect. The workflows are the specific Apple-approved ProRes workflows as well as games, in CPU-limited fashion by the author's own admission. I linked the article for it's power consumption figures, you can watch the video linked above for proper GPU performance testing, where Rosetta 2 has negligible impact because of low CPU usage.

>Wow. So, rendering and manipulating video and photo are purely CPU benchmarks. Ridiculous.

Yes. CPU-rendering is a CPU benchmark. Lightroom is a CPU benchmark for all but a few very specific tasks (AI upscaling for example), and cannot properly utilize a high-performance GPU for anything else.

They created a good laptop GPU-wise for editing ProRes footage and using very large datasets, nothing else. If you need to do rendering, or game design, or have to work with complex CAD files, or have to do 90% of GPU compute workloads, or have to do 3D modelling/sculpting, or literally anything else a professional (or not) would want to do, it's embarrassingly deficient.


Very few people buy gaming laptops to edit video on, but that's not a reason for them to perform poorly at the task. The most frustrating part about the M1-series is that it's gaming performance is limited by arbitrary software decisions.


How is this relevant to the Apple Neural Engine and this BlackHat talk?


If you're shopping for a Macbook Pro, gaming is the extremely narrow use-case, no?

Edit: various benchmarks seem to show the M1 Max games at about the FPS of a RTX 3060L (though with lower power). It's around the same price as a similarly spec'd 3060L gaming laptop.


It's going to get a lot less hot and noisy than a similarly specced 3060 laptop while also having vastly better battery life, which is worth something (varies depending on one's personal priorities). I would imagine a lot of people looking for portable power might value those qualities enough to trade away some raw graphical muscle.


Well it's narrow historically but apple did seem to be implying it could game competitively with the latest from Nvidia. Which it apparently can (although there are practically no games), just bump the model down a step from what apple are saying.

Also M1 Max seems to be on the order of a £1000 more than a 3080 equipped laptop, where are you getting that figure from?


The problem with “latest from X” is that there’s such a wide perf gap between the latest budget device and the latest high perf one.

I also got irked by their comparison to old baseline gpu perf. Everyone knows intels integrated gpu is terrible, so comparing to that is pointless.


It's signiticantly more expensive than a 3060 equipped Zephyrus and that doesn't change apples marketing pretending it was a 3080.


Having owned one of those Zephyrus and owning a M1X there is no comparison on terms of quality or usability. Zephyrus is hot, loud, has a poor screen in comparison, and poor battery life.


You're right about the battery life, but in exchange an AMD zephyrus is significantly more performant at a lower price. That's a fair trade off.

It's also not a ticking time bomb, since the SSD can be changed without sending the device into apple to change the entire CPU/SSD/GPU combo. Every one of these apple devices basically has planned obsolescence built in.


The person your responding to listed 5 things the m1 max was better at. You responded by saying “but it’s faster!!!” While addressing none of them.


some of those things are just subjective ('usability'), and some are part of the trade-off I was talking about. The zephyrus isn't as hot or loud when its limiting itself to the mac pros performance.


It’s only faster at GPU, no? CPU-wise it is really hard to beat even the regular M1.


A ryzen 9 is slower single core but faster multi core performance than an m1, but of course its TDP is higher so battery life will be significantly different.

If you care about having 20 hours of battery life instead of 8-10, the m1 is definitely going to win that. But from a pure performance perspective there are chips on the market faster than the m1 in the laptop sphere from AMD already.

If you get an zephyrus with an AMD cpu and a dedicated GPU you will pay less than the macbook pro and be more performant, in exchange for a hotter laptop with lower battery life. But that's just a tradeoff, not necessarily a win one way or another. I've never needed 20 hours of battery life and I don't prioritize it. 8-10 is enough for me. I'd rather have the faster machine.

And I can change the SSD or RAM myself without expecting it to make itself implode later in its lifespan because its all in one package as the apple silicon is.


The M1 is an impressive chip... for what it is. I bought an Air just to mess with it, while in the past I would not have touched Apple with a 10 foot pole. But, the whole framing around M1 being better than _everything_ in the x86 world was just weird and not true. Yes, perf/watt it was unbeatable, but the rest...if you just care about raw processing power, you'd still buy a desktop x86. I think Apple framed it this way and most media ran with it, but I don't see this a sue apple kind of deal.


M1 tops SPEC2017 while giving you 15-20h of battery life. I'm not mentioning hardware specific accelerated tasks like Final Cut, if you throw that in it beats Xeons at a fraction of the price and power. It's just the best portable chip for regular, boring, standard Floats and Ints computation. There's really no competition yet.

Oh, and it's got crazy fast IO. And it's last year's architecture and process node. M2 with 4nm and A15 derived architecture should be another 20-30% faster at the same wattage.

The real question is the Mac Pro. What can and will Apple do when performance is the main focus? Will they just stick a bunch of M1s together (which wouldn't be bad) or are they able to crank single threaded performance a lot higher with more fans and watts to spare?


If raw performance is the only thing you care about then you would get a desktop. If you care about mobility as well or changes a lot.


It’s funny that you mention mobility. I usually see people stuck an extra slab of USB C to HDMI/USB hub beside their thinnest and lightest Apple laptops so that they can use peripherals.


Usually, except for the overwhelming majority of people who don't use a hub.

(e.g. my Air has two cables running into it, power and display)


Not sure I understand what point you're trying to make, but when I "go mobile" I am typically leaving behind my monitor, my wired keyboard and mouse/trackpad, my backup drive, and most of the other shit I might have daisychained off a USB hub. So I actually see it as an incredible convenience that I can run all of that, plus charging, plus wired networking, off a single USB-C port.

It feels far more "mobile" than a setup where I need to plug/unplug all that stuff into the body of the laptop whenever I want to use my laptop away from my desk.


And those USB-C hubs are like $20, so I can just get one for each spot I move between. Remember the old days of $200 hubs?


I've actually found it slightly better in single-threaded performance. Only slower with rosetta.


1. Cite exactly what they said in the presentation, and in particular did they compare actual games, or synthetic benchmarks?

2. It is insanely competitive, especially given power consumption, with the NVidia card in both synthetic benchmarks and video editing. Not so much with games…but that's complicated by most games still running in Rosetta. https://www.anandtech.com/show/17024/apple-m1-max-performanc...

3. Video is not an edge case. And how can I claim that? Because you are here citing…a video.

4. What does this have to do with the linked article?


For 3: you cannot compare consuming media with creating media.


Generating media in real-time on the user's computer rather blurs the line between who is creating vs consuming media.


Well played! I had to reread what your wrote—and it was worth it!


Surely bigger issue is most games having given up supporting MacOS?


All game engines that matter in the market support Metal.


Do frostbite and the call of duty engine/s support it?

Also engine support means nothing if you can't actually buy any games for a few years.


Apparently not.

On the other hand I guess there might be possible to buy some games written in Unreal, Unity Ogre3D, Defold, Xenko/Stride,....


Picking up am M1 Pro tomorrow I hope. Assuming it can play Factorio?


It doesn't support ARM64 and according the developer, probably never will but it runs very well on Rosetta.


Factorio has a Mac client[0] so the M1 Pro will surely play it just fine.

0: https://store.steampowered.com/app/427520/Factorio/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: