Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A couple questions:

1. How much of this outcome is due to the unusual (pseudo) periodic activation function? Seems like a lot of the DAC-like behavior is coming from the periodicity of the first layer’s output, which seems to be due to the unique activation function.

2. Would the behavior of the network change if the binary strings were encoded differently? The author encodes them as 1D arrays with 1 corresponding to 1 and 0 corresponding to -1, which is an unusual way of doing things. What if the author encoded them as literal binary arrays (i.e. 1->1, 0->0)? What about one-hot arrays (i.e. 2D arrays with 1->[1, 0] and 0->[0, 1]), which is the most common way to encode categorical data?



For 1., the author used Ameo (the weird activation function) in the first layer and tanh for others, later on notes:

"While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way. Interestingly, the weights learned in that case are fractions of π rather than 1."

By the looks of it, any activation function that maps a positive and negative range should work. Haven't tested that myself. The 1 vs π is likely due to the peaks of the functions, Ameo at 1 and sine at π/2.

Regardless, it's not Ameo.


I think that the activation function definitely is important for this particular case, but it's not actually periodic; it saturates (with a configurable amount of leakyness) on both ends. The periodic behavior happens due to patterns in individual bits as you count up.

As for the encoding, I think it's a pretty normal way to encode binary inputs like this. Having the values be -1 and 1 is pretty common since it makes the data centered at 0 rather than 0.5 which can lead to better training results.


1. The author has prior blog posts talking about this activation function, and apparently it does help to learn binary logic tasks.

2. I doubt this matters at all here. For some architectures having inputs be 0 on average is useful, so the author probably just picked it as the default choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: