More

ankeshanand · on June 27, 2024

If you're an individual developer and not an enterprise, just go straight to Google AIStudio or GeminiAPI instead: https://aistudio.google.com/app/apikey. It's dead simple getting an API key and calling with a rest client.

luke-stanley · on June 27, 2024

Interesting but when I tried it, I couldn't figure out the billing model because it's all connected to Google projects, and there can be different billing things for each of them.

Each thing seems to have a bunch of clicks to setup that startup LLM providers don't hassle people with. They're more likely to just let you sign in with some generic third party oAuth, slap on Stripe billing, let you generate keys, show you some usage stats, getting started docs, with example queries and a prompt playground etc.

What about the Vertex models though? Are they all actually available via Google AI Studio?

lhl · on June 27, 2024

Sadly, while gemma-2-27b-it is available (as a Preview model) on the AI Studio playground, it didn't show up via API on list_models() for me.

ankeshanand · on Feb 21, 2024

We've done extensive comparisons against GPT-4V for video inputs in our technical report: https://storage.googleapis.com/deepmind-media/gemini/gemini_....

Most notably, at 1FPS the GPT-4V API errors out around 3-4 mins, while 1.5 Pro supports upto an hour of video inputs.

jxy · on Feb 21, 2024

So that 3-4 mins at 1FPS means you are using about 500 to 700 tokens per image, which means you are using `detail: high` with something like 1080p to feed to gpt-4-vision-preview (unless you have another private endpoint).

The gemini 1.5 pro uses about 258 tokens per frame (2.8M tokens for 10856 frames).

Are those comparable?

moralestapia · on Feb 21, 2024

>while 1.5 Pro supports upto an hour of video inputs

At what price, tho?

verticalscaler · on Feb 21, 2024

The average shot length in modern movies is between 4 and 16 seconds and around 1 minute for a scene.

ankeshanand · on July 18, 2023

Has anyone in this subthread actually read the papers and compared the benchmarks? LLama2 is behind PALM-2 on all major benchmarks, I mean they spell this out in the paper explicitly.

ankeshanand · on April 25, 2022

https://aipaygrad.es/

ankeshanand · on Jan 24, 2022

You can also rent a cloud TPU-v4 pod (https://cloud.google.com/tpu) which 4096 TPUv-4 chips with fast interconnect, amounting to around 1.1 exaflops of compute. It won't be cheap though (excess of 20M$/year I believe).

ankeshanand · on Jan 12, 2022

It's important in the context that RL does not have performance ceilings.

ankeshanand · on Jan 12, 2022

Looks like any Github pages served with CloudFlare are getting blocked, I am trying out a fix.

ankeshanand · on Sept 21, 2021

Yep, Karpathy has mentioned this multiple times in their AI talks.

rrobukef · on Sept 22, 2021

What a weird case of 'nomen est omen'.

ankeshanand · on May 16, 2020

If you carefully curate who you follow, Twitter can be more like a bunch of subreddits, with the added signal of knowing who's posting. So it ends up a being great way to keep up with small communities.

ankeshanand · on Feb 2, 2020

Sorry if it wasn't clear, I do mention the linear classification protocol several times in the post. If you want to evaluate performance on a classification task, you have to show it labels during evaluation, otherwise it's an impossible task. Note that the encoder is freezed during evaluation, and only a linear classifier is trained on top. Now, even when evaluated on a limited set of labels (as low as 1%), contrastive pretraining outperforms purely supervised training by a large margin (check out Figure 1 in the Data-Efficient CPC paper: https://arxiv.org/abs/1905.09272.

I did not get the second part unfortunately, could you elaborate more and clarify if you are talking about a specific paper?

fxtentacle · on Feb 3, 2020

The problem that I see with supervised training of a linear classifier after unsupervised training is that if the unsupervised network is large enough, it allows the supervised trainer to choose the working components. As shown in [1] that can lead to randomly initialized networks working well, too, meaning that this does not necessarily show that the unsupervised training produced useful features.

I would instead suggest to train a categorization classifier unsupervised, too, for example using mutual information loss with the correct number of categories, as suggested in [2]. Afterwards, one can then deduct the mapping between the categories learnt unsupervised and the groundtruth categories to allow evaluation. That way, good results clearly prove a good unsupervised training method.

The problem that I mean in the second part was that most networks trained for object recognition work on low-level features such as colors and textures, as shown in [3]. The turtle clearly has a turtle shape and arrangement and looks overwhelmingly like a turtle to humans. But its high-frequency surface details are those that the neural network associates with a rifle, which is why those networks are fooled even on photos from varying perspectives.

Training a network with a loss to ensure that the local area of an image produces features that are highly correlated to the global features of the same image does not avoid this problem, because the high-frequency patters that the AI erroneously uses for detection are present both in the local as well as in the global scale. Sadly, I don't have any idea on how to improve that either.

[1] What's Hidden in a Randomly Weighted Neural Network? https://arxiv.org/abs/1911.13299

[2] Invariant Information Clustering for Unsupervised Image Classification and Segmentation https://arxiv.org/abs/1807.06653

[3] Synthesizing Robust Adversarial Examples https://arxiv.org/abs/1707.07397