The greedy approach has an 1-1/e type of approximatation. You basically prove that the problem is in the class and then solve it greedily. It’s not much different than proving the problem is convex and then using gradient descent.
There isn’t much theory on why it works in the nonconvex case. You can similarly make discrete heuristic algorithms for sorting, etc. with no guarantees.
Anyway, as nonconvex models are usually stacked convex models, you can find works that incorporate these submodular functions as neural network layers.