More

arrowleaf · on Sept 10, 2024

I would absolutely love this!

arrowleaf · on Aug 14, 2024

It's not fair competition, healthcare is full of players competing with each other in a highly regulated marketplace. The system has been set up in a way (generally with good intentions, you want doctors to have expensive malpractice insurance for example) that no matter who is providing the services, the costs to the end user are going to be high.

practicemaths · on Aug 14, 2024

So, maybe something's can't be solved by free market capitalism?

arrowleaf · on Aug 13, 2024

You'd have to use the volume of earth, not the mass. Google tells me that lava is ~3x denser than water.

rachofsunshine · on Aug 13, 2024

Lava is not really representative of the Earth as a whole, as it turns out. The mantle (which is the vast majority of Earth's volume) isn't a liquid, it's a squishy deformable solid. Magma that comes from the mantle is only liquid because of the removal of pressure or the addition of water; it wasn't liquid down there. And a lot of lava comes from crustal melting, not mantle material.

Earth as a whole has a density about 5.5x that of water.

arrowleaf · on Aug 9, 2024

You can similarly use materialized views for this type of denormalization. When an update to the source tables occur you can trigger a view refresh.

recroad · on Aug 12, 2024

Yes, very possible. This is a multi-tenant application so I don't want to trigger a view refresh so frequently as most updates don't affect most people.

arrowleaf · on Aug 2, 2024

Truly, 100B nodes needs some sort of aggregation to have a chance at being useful. On a side project I've worked with normalizing >300GB semi-structured datasets that I could load up into dataframe libraries, I can't imagine working with a _graph_ of that size. I thought I was a genius when I figured out I could rent cloud computing resources with nearly a terabyte of RAM for less than federal minimum wage. At scale you quickly realize that your approach to data analysis is really bound by CPU, not RAM. This is where you'd need to brush off your data structures and algorithms books. OP better be good at graph algorithms.

godelski · on Aug 2, 2024

1) 100B? Try a thousand. Of course context matters, but I think it is common to overestimate the amount of information that can be visually conveyed at once. But it is also common to make errors in aggregation, or errors in how one interprets aggregation.

2) You may be interested in the large body of open source HPC visualization works. LLNL and ORNL are the two dominant labs in that space. Your issue might also be I/O since you can generate data faster than you can visualize it. One paradigm that HPC people utilize is "in situ" visualization. Where you visualize at runtime so that you do not hold back computation. At this scale, if you're not massively parallelizing your work, then it isn't the CPU that's the bottleneck, but the thing between the chair and keyboard. The downside of in situ is you have to hope you are visualizing the right data at the right time. But this paradigm includes pushing data to another machine that performs the processing/visualization or even storage (i.e. compute on the fast machine, push data to machine with lots of memory and that machine handles storage. Or more advanced, one stream to a visualization machine and another to storage). Checkout ADIOS2 for the I/O kind of stuff.

https://github.com/ornladios/ADIOS2

arrowleaf · on July 10, 2024

I'd be real curious to see their ROI calculations on this strategic move. Laying off those 1800 employees will cost them $260 million, before factoring in hiring costs for 1800 highly technical replacements.

Salaries for experienced SWEs in Boise are cheap. Looking at levels.fyi there's a recent salary from an Android engineer with 5 years of experience grossing 143k. If they're looking for FAANG talent they won't even be able to poach a college grad for that amount.

arrowleaf · on July 10, 2024

In the Boise market, Intuit is laying off 157 employees and announced the closure of their campus due to Boise not being a strategic location for AI R&D. This comes after Intuit bought the Boise-local time-tracking software company, TSheets, and expanded the TSheets campus with a second building to accommodate up to 900 employees.

https://boisedev.com/news/2024/07/10/intuit-will-shut-down-e...

arrowleaf · on June 3, 2024

If it looked for `://` then stuff like `mailto:me@fake.email` would break.

arrowleaf · on May 8, 2024

Using polars in Python I've gotten similar to work, using LazyFrame and collect in streaming mode:

``` df = pl.scan_parquet('tmp/'+DUMP_NAME+'_cleaned.parquet')

with open('tmp/'+DUMP_NAME+'_cleaned.jsonl', mode='w', newline='\n', encoding='utf8') as f: for row in df.collect(streaming=True).iter_rows(named=True): row = {k: v for k, v in row.items() if (v is not None and v != [] and v != '')} f.write(json.dumps(row, default=str) + '\n') ```

akdor1154 · on May 9, 2024

This collects all into memory, then iterates.

arrowleaf · on April 19, 2024

It would have been awesome if that ISBN issue were an actual ISBN re-use issue. I've run into issues coupling my system's design to the assumption that ISBNs uniquely identify a single edition of a single book. The intent of ISBNs are to be unique, but mistakes are made and resellers lose track or straight up abuse some ISBNs.

lstamour · on April 19, 2024

Not just that but when ISBNs were first introduced, it was common thought that they were used for cash register price scanning rather than computer-controlled inventory, so a number of early books had ISBNs re-used by publishers and it wasn't caught because they were meant to be the same price. These days you often see two barcodes on books, one is the ISBN and the other is the price.