Screen readers don't read the hierarchy of elements back to you. They find the text that renders inside the elements and read it out loud to you. And for navigating between elements, the level of nesting is much less relevant than having the correct aria-role assigned to each element that contains an interactive component.
To be honest, I think people tend to overestimate just how long it takes to make a nice looking button or write some flexbox styling. Even standard form components are really not too time consuming to style.
What pushes me towards using third party code is stuff like autocomplete search with drop-down selects, mostly because I don't want to mess up on the accessibility front, either keyboard navigation or screen readers, and there's at least a few that have that part figured out already.
The idea of components is fine. I should have been clear, there. The problem typically comes from the authoring tools. To have the hooks necessary to put the decoration that we want, in html, they typically add a ton of div elements. When, realistically, you could almost certainly get what you want with very minimal markup.
And being fair, I'm sure this has gotten a bit better in recent years. But the Rube Goldberg efforts people would put in to get the "flow" of the browser to automatically place things in locations that were easily calculated is frustrating.