Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I actually use TypeScript/JavaScript a lot for this reason, especially for biological algorithms that I want to run in the browser. The developer tooling is also as good as you can hope for, especially when using VS Code. I actually wrote a circular RNA sequence deduplication algorithm in it just recently [1].

With respect to the identifier resolution in Nim, it strikes me as more of a matter of preference. Especially given the universal function call syntax in Nim, at least it's consistent. For example, Nim treats "ATGCA".lowerCase() the same as lowercase("ATGCA"). I do appreciate the fact that you can use a chaining syntax instead of a nesting one when doing multiple function calls but this is also a matter of style more than substance.

[1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...



That's great. However the method that you use to find the canonical representative [1] is quadratic (when the string has length N, there are N rotations and for each rotation you need to check N characters to determine whether this is earlier than the best on that you have found so far). For large strings you would probably want to switch to one of the linear minimal string rotation algorithms [2], for example Booth's Algorithm.

[1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...

[2] https://en.wikipedia.org/wiki/Lexicographically_minimal_stri...


You hit the nail on the head. This is just the lexicographically minimal string rotation with a canonicalization step. I have actually had this on my to-do list for a while. The truth is that the data set I'm applying this to is small (though I'm doing my best to change that by discovering new viroid-like agents) so the optimization has yet to become pressing. At some point, I'd like to spin this script out into a standalone tool but I'm not sure the demand is there yet.


The main beef I have with the javascript ecosystem for data analysis is lack of multicore. Yes lots of things can be solved by converting multi-core to multi-process or using other workarounds but there are a whole class of problems where shared memory access makes a huge amount of sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: