A couple questions: 1. How much of this outcome is due to the unusual (pseudo) p...

evrimoztamur · on Jan 16, 2023

For 1., the author used Ameo (the weird activation function) in the first layer and tanh for others, later on notes:

"While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way. Interestingly, the weights learned in that case are fractions of π rather than 1."

By the looks of it, any activation function that maps a positive and negative range should work. Haven't tested that myself. The 1 vs π is likely due to the peaks of the functions, Ameo at 1 and sine at π/2.

Regardless, it's not Ameo.

Ameo · on Jan 16, 2023

I think that the activation function definitely is important for this particular case, but it's not actually periodic; it saturates (with a configurable amount of leakyness) on both ends. The periodic behavior happens due to patterns in individual bits as you count up.

As for the encoding, I think it's a pretty normal way to encode binary inputs like this. Having the values be -1 and 1 is pretty common since it makes the data centered at 0 rather than 0.5 which can lead to better training results.

sebzim4500 · on Jan 17, 2023

1. The author has prior blog posts talking about this activation function, and apparently it does help to learn binary logic tasks.

2. I doubt this matters at all here. For some architectures having inputs be 0 on average is useful, so the author probably just picked it as the default choice.