Why do we still flatten embedding spaces?

6 points by Intrinisical-AI a day ago

ost dense retrieval systems rely on cosine similarity or dot-product, which implicitly assumes a flat embedding space. But embedding spaces often live on curved manifolds with non-uniform structure—dense regions, semantic gaps, asymmetric paths.

I’ve been exploring the use of:

- Ricci curvature as a reranking signal

- Soft-graphs to preserve local density

- Geodesic-aware losses during training

Curious if others have tried anything similar? Especially in information retrieval, QA, or explainability. Happy to share some experiments (FiQA/BEIR) if there's interest.

TXTOS 5 hours ago

Really love this question — been thinking about it for a while now.

We’ve been hacking on a different approach we call the WFGY Engine — it treats embedding space not as flat or uniform, but more like a semantic energy field. So instead of forcing everything into clean cosine ops, we track these “semantic residue structures” — kind of like how meaning resists compression when you flatten it too early.

We measure stuff like ΔS (semantic tension), λ_observe (viewpoint drift), and E_resonance (contextual energy), which lets us reason across curved or even broken spaces where normal LLMs kinda lose track. It’s a mix between geodesic logic and language field dynamics — weird but surprisingly stable.

A couple of early apps built on this — TXT OS, BlahBlahBlah, etc — ended up scoring 100/100 from 6 major AI models, which was a cool surprise. We’re still in early dev, but even Tesseract gave it a formal endorsement recently, which was huge for us.

Anyway, core engine is all open source if anyone’s curious: https://github.com/onestardao/WFGY

Would love to hear if others are exploring non-flat logic or weird manifold behavior in language space too.

can link the Tesseract thing if folks want

Intrinisical-AI 3 hours ago

Love your project. And the mentiond metrics ΔS (semantic tension), λ_observe (viewpoint drift), and E_resonance (contextual energy) totally align with my mental model.
Also loved your phrase: "kind of like how meaning resists compression when you flatten it too early." That's a key point. Implicit in that is the assumption that flattening will eventually happen — and I agree (assumed for efficience in last term): Just like how we create flat maps of the Earth sometimes a flat projection is useful _as long as_ we understand the properties of the deformation and the underlying surface. If we don't, EEUU might seem a lot bigger than it's "represented", in our euclidean view.
That interpretation really clicks for me.
I'm currently exploring *geodesic-aware training methods for IR*, based on contrastive / triplet loss. Take a look if you're curious: [https://github.com/Intrinsical-AI/geometric-aware-retrieval-...](https://github.com/Intrinsical-AI/geometric-aware-retrieval-...)
Still recent and (very) experimental — but already done some initial tests. The goal is to construct a differentiable graph over the embedding space and propagate training signals end-to-end. Lots of work ahead, but preliminary results are promising (and adjusting things with o3 feedback ofc).
Are you open to collaborating or jamming on this? Would love to sync up — are you on Discord or Slack?
Thanks again for pushing this forward.

PaulHoule a day ago

I've been bothered by this since before there were transformers.

Probably the most interesting function over t is G(t), that function Chomsky said was the grammar in that it is true if t is well-formed and false if it isn't.

G(t) over t is not a manifold because it is not continuous and its projection in the embedding space can't be continuous either. It boggles my mind, and leaves me thinking that it's not legitimate to work in the embedding space but it obviously works.

If you have two points in the embedding space which represent well-formed sequences and draw a line that interpolates between them you'd think that there would have to be points in between that correspond to ill-formed sequences. Intuition over high dimensional spaces is problematic, but I imagine there have to be structures in there that "look" like a crumpled up ball of 2-d paper in a 3-d space or are folded up like filo dough.

Intrinisical-AI 6 hours ago

That's fascinating! — but I don't fully agree with the framing.
Using G(t) in the context of embeddings seems problematic, specially given the probabilitistic nature.
Example - Take a sentence with a typo but semantically clear and correct (let's suppose): "The justice sistem is corrupt."
G(t) = 0, right? But semantically, it's close to G(t) → 1.
Instead of focusing on exact validity —which for me seems too rigid for something as ambiguous and context-dependent as language— what if we focused on _approximate semantic trajectories_?
You wrote:
> "If you have two points in the embedding space which represent well-formed sequences and draw a line that interpolates between them you'd think that there would have to be points in between that correspond to ill-formed sequences."
In my view, it's actually the opposite:
> If the embedding model captures meaningful structure, and you account for geometric properties like curvature, local density, and geodesics — then the path between those two points should ideally trace semantically valid (even "optimal" - if that exists) reasoning.
The problem isn't that interpolation fails — it's that we're interpolating linearly in a space that likely isn't flat!
Thanks for your comment. Lmk what you think :)
- PaulHoule 5 hours ago
  
  You're right that "systems competent in language" (either humans or LLMs) are able to accept and understand slightly wrong sequences but generate correct sequences almost all of the time. (Who hasn't made a typo when talking to a chatbot and had the chatbot ignore the type and respond correctly?)
  Treating G(t) as a binary function works for linguists who need a paradigm to do "normal science" but Chomsky's theory has not been so useful for building linguistically competent machines so there have to be serious things wrong with that theory.
  Still, the vast majority of sequences t are gibberish that is nowhere near being valid. If those gibberish sequences are representable in the embedding space and took up a volume anywhere near the numeric prevalance they have I can only imagine that in a (say) N=3000 embedding space there is something like a manifold that is N=2999 or N=2998 or N=1500 or something inside the flat embedding space -- that structure would be the non-flat embedding you're looking for or be an approximation to it.
  It might be that it is not really a manifold or has different dimensionalities in different places or even fractional dimensionalites. For instance you'd hope that would geometrically represent semantics of various sorts as suggested by the graphs here
  https://nlp.stanford.edu/projects/glove/ [1]
  So I've thought a lot about sympletic spaces in higher dimensions where area has to be conserved over various transformations (the propagator) and maybe this has led me to think about it the totally wrong way -- maybe the flat embedding space doesn't devote a large volume to gibberish because it was never trained to model gibberish strings, which has to have interesting implications if that is true.
  Something else I think of is John Wheeler's idea of superspace in quantum gravity where, even though space-time looks like a smooth manifold to us, the correct representation in the quantum theory might be discrete: maybe for points a, b there are the possibilities that: (1) a,b are the same point, (2) a is the future or b, (3) b is the future of a, or (4) a and b are not causally connected. So you have this thing which exists on one level as something basically symbolic but looks like a manifold if you live in it and you're much bigger than the Planck length.
  [1] ... which I don't really believe, of course you can project out 20 points out of a 50 dimensional embedding into an N=2 space and have the points land wherever you want!
  - Intrinisical-AI an hour ago
    
    Wow! you brought up several deep ideas that deserve unpacking step by step (as if we were LLMs):
    - On the manifold being “high-dimensional” (e.g., 2999): I got your intuition; the set of valid linguistic sequences is tiny relative to the space of all possible strings, but still enormously rich and varied. So the valid set doesn’t fill the whole space, but it also can’t live in a low-dimensional manifold like 20D. But I'm also not so sure about that: how many ways you have to give an accurate response? Hard to argue than many more than one. Hard to argue even one of them it's completly correct. _There must be some sort of "clustering"_.
    - On domain-specific manifolds and semantic transitions: 100% agree with your idea that different domains induce distinct geometric structures in embedding space, and even that the idea of a "simple manifold", seems to optimistic. But what about "regions" with common (geometric / topological) properties? Eg: Physics should? form a dense structured region, and I guess there common patterns between the implicit structure of it's subspace, and the maths' one for example. The semantic trajectories inside each domain will follow specific rules, but patterns must exists, and also should be transitional zones or “bridges” between them. I relate the emergen abilities into LLMs with this (what are LLMs but transformers of vectorial representations, taken by "views / parts / projections" (e.g: multi-attention heads).
    What if we hypothetize about chart atlases; multiple local coordinate systems with smooth transition maps? Maybe a patchwork of overlapping manifolds, each shaped by domain-specific usage, linked by pathways of analogy or shared vocabulary.. Even if this is the case (we only guessing), the problem it's that the computational costs, and interpretations are neither trivial.
    - On GloVe and the projection fallacy: I take your point, you can always "cherry-pick" the most good loonking examples to tell your story haha
    - On symplectic spaces: I don’t know enough about symplectic geometry :( Only think: you got me thinking about hyperbolic spaces where volume grows exponentially; anti-intuitive from an euclidean point of view
    - “maybe the flat embedding space doesn’t devote volume to gibberish because it was never trained to model gibberish.”
    I initially thought of this as a kind of "contraction", but that term might be misleading - but thinking about it I prefer the idea of density redistribution. Like a fluid adapting to an invisible container --> Maybe it's like a manifold emergence through optimization pressure indirectly sculpted by the model’s training dynamics.
    - Wheeler Superspace: Again, I cannot follow you :( I guess you're pointing that the semantical relationships could be formulated as discrete.. BUT, as a non-physicist, I honestly can’t tell the (any?) difference between being modeled as discrete vs being discrete. (xD)
    Thank for the deep response, Paul! Its a pleasure having this conversation with you.
- PaulHoule 5 hours ago
  
  You're right that "systems competent in language" (either humans or LLMs) are able to accept and understand slightly wrong sequences but generate correct sequences almost all of the time. (Who hasn't made a typo when talking to a chatbot and had the chatbot ignore the type and respond correctly?)
  Treating G(t) as a binary function works for linguists who need a paradigm to do "normal science" but Chomsky's theory has not been so useful for building linguistically competent machines so there have to be serious things wrong with that theory.
  Still, the vast majority of sequences t are gibberish that is nowhere near being valid. If those gibberish sequences are representable in the embedding space and took up a volume anywhere near the numeric prevalance they have I can only imagine that in a (say) N=3000 embedding space there is something like a manifold that is N=2999 or N=2998 or N=1500 or something inside the flat embedding space -- that structure would be the non-flat embedding you're looking for or be an approximation to it.
  It might be that it is not really a manifold or has different dimensionalities in different places or even fractional dimensionalites. For instance you'd hope that would geometrically represent semantics of various sorts as suggested by the graphs here
  https://nlp.stanford.edu/projects/glove/ [1]
  So I've thought a lot about sympletic spaces in higher dimensions where area has to be conserved over various transformations (the propagator) and maybe this has led me to think about it the totally wrong way -- maybe the flat embedding space doesn't devote a large volume to gibberish because it was never trained to model gibberish strings, which has to have interesting implications if that is true.
  Something else I think of is John Wheeler's idea of superspace in quantum gravity where, even though space-time looks like a smooth manifold to us, the correct representation in the quantum theory might be discrete: maybe for points a, b there are the possibilities that: (1) a,b are the same point, (2) a is the future or b, (3) b is the future of a, or (4) a and b are not causally connected. So you have this thing which exists on one level as something basically symbolic but looks like a manifold if you live in it and you're much bigger than the Planck length.
  But to get to that answer of "why do we flatten it?", we're not flattening it deliberately, the "flattening" is done by the neural network and we don't know another way to do it.
  [1] ... which I don't really believe, of course you can project out 20 points out of a 50 dimensional embedding into an N=2 space and have the points land wherever you want!

bravesoul2 7 hours ago

I found your medium article on this subject by hunting around. It's very interesting. Hope you write more.

From a layman's point of view, say this is true and embeddings should be a manifold. Like a horse saddle or sphere for example. Then some adding of vectors the simple way won't make sense. On a sphere west in China plus west in US is double west but in 3d it's zero as they point in different directions. Is this sort of the idea?

Intrinisical-AI 5 hours ago

Hey man! Thanks a lot for your support! Might sound like just common words, but honestly — knowing this helped or inspired someone really motivates me. Makes me feel a bit less like a madman hahaha.
About your question — I think I get your point. Here’s how I understand it:
In the hypothetical latent manifold, we might want to measure two different things:
1. Distance between two points: This wouldn’t be the usual Euclidean distance, because the space is curved. Like how the shortest path between two cities on Earth isn't a straight line, but an arc on the globe. That’s where geodesics come in — they’re the shortest paths constrained by the shape of the manifold.
2. Similarity between two vectors ("parallel transport") Instead of asking where the vectors point in ambient space (like cosine similarity in R^n), we (should) care about how their directions compare _on the surface itself_. So ideally, we’d compare them along the geodesic — parallel transport — to properly align their frames of reference before measuring any angle or similarity.
That’s the intuition I’m working with, anyway. Let me know what you think, and thanks again for your comment!