Ask HN: What are people's experiences with knowledge graphs?
I see lots of YouTube videos and content about knowledge graphs in the context of Gen AI. Are these at all useful for personal information retrieval and organization? If so, are there any frameworks or products that you'd recommend that help construct and use knowledge graphs?
as a non expert but interested on the field of AI. / LLM's, it feels -intutively- that the symbolic reasoning layer a KG and ontologies bring, is the only way to ground down into -truths- the hallucinations of LLM's, if regular LLM's are just auto-complete, how do you give them the capability to reason about the world, and environment they operate in, how do you adscribe -truth- to certain wrods, or entities, most can be solved by giving it the specific context, but in the specific area of autonomous -super intelligence- one would expect for the system to be able to gather, construct ,and expand on suck knowledge graph by itself.
recent paper that has AI continiously rebuilding the KG's, for example: https://arxiv.org/abs/2502.13025
Knowledge graphs are really useful for personal information retrieval and organization. They can integrate and correlate scattered information, making searches more efficient. For example, when you're looking for job - related materials, it can quickly link information like salary and job requirements. For construction, the open - source Dgraph is recommended. It has strong scalability and supports complex queries. What kind of information do you plan to manage with a knowledge graph?
I wanted to take my notes from different resources I’ve set up/deployed (such as AWS CDK, configurations, and caveats of these resources). I’d also like to have annotated code in there too. The goal is to ultimately be able to search the graph or augment an LLM with it.
I don't see how you're not better served by obsidian or something along those lines?
I've never heard of it but just downloaded it thanks.
I have worked on a large scale real time in production knowledge graph at FAANG. Runs a major service everyone has used. Personal opinion is that they are outdated. They are brittle, hard to maintain, constantly changing. Technical complexity like you wouldnt believe. A moving target in the real world.
I think as a paradigm, you should just pass everything that might be relevant into the context of an LLM, as opposed to traversing a knowledge graph.
Property graphs don't specify schema.
Is it Shape.color or Shape.coleur, feet or meters?
RDF has URIs for predicates (attributes). RDFS specifies :Class(es) with :Property's, which are identified by URIs.
E.g. Wikidata has schema; forms with validation. Dbpedia is Wikipedia infoboxes regularly extracted to RDF.
Google acquired metaweb freebase years ago, launched a Knowledge Graph product, and these days supports Structured Data search cards in microdata, RDFa, and JSONLD.
[LLM] NN topology is sort of a schema.
Linked Data standards for data validation include RDFS and SHACL. JSON schema is far more widely implemented.
RDFa is "RDF in HTML attributes".
How much more schema does the application need beyond [WikiWord] auto-linkified edges? What about typed edges with attributes other than href and anchor text?
AtomSpace is an in-memory hypergraph with schema to support graph rewriting specifically for reasoning and inference.
There are ORMs for graph databases. Just like SQL, how much of the query and report can be done be the server without processing every SELECTed row.
Query languages for graphs: SQL, SPARQL, SPARQLstar, GraphQL, Cypher, Gremlin.
Object-attribute level permissions are for the application to implement and enforce. Per-cell keys and visibility are native db features of e.g. Accumulo, but to implement the same with e.g. Postgres every application that is a database client is on scout's honor to also enforce object-attribute access control lists.
And then identity; which user with which (sovereign or granted) cryptographic key can add dated named graphs that mutate which data in the database.
So, property graphs eventually need schema and data validation.
markmap.js.org is a simple app to visualize a markdown document with headings and/or list items as a mindmap; but unlike Freemind, there's no way to add edges that make the tree a cyclic graph.
Cyclic graphs require different traversal algorithms. For example, Python will raise MaxRecursionError when encountering a graph cycle without a visited node list, but a stack-based traversal of a cyclic graph will not halt without e.g. a visited node list to detect cycles, though a valid graph path may contain cycles (and there is feedback in so many general systems)
YAML-LD is JSON-LD in YAML.
JSON-LD as a templated output is easier than writing a (relatively slow) native RDF application and re-solving for what SQL ORM web frameworks already do.
There are specs for cryptographically signing RDF such that the signature matches regardless of the graph representation.
There are processes and business processes around knowledge graphs like there are for any other dataset.
OTOH; ETL, Data Validation, Publishing and Hosting of dataset and/or servicing arbitrary queries and/or cost-estimable parametric [windowed] reports, Recall and retraction traceability
DVC.org and the UC BIDS Computational Inference notebook book probably have a better enumeration of processes for data quality in data science.
...
With RDF - though it's a question of database approach and not data representation -
Should an application create a named graph per database transaction changeset or should all of that data provenance metadata be relegated to a database journal that can't be read from or written to by the app?
How much transaction authentication metadata should an app be trusted to write?
A typical SQL webapp has one database user which can read or write to any column of any table.
Blockchains and e.g. Accumulo require each user to "connect to" the database with a unique key.
It is far harder for users to impersonate other users in database systems that require a cryptographic key per user than it is to just write in a different username and date using the one db cred granted to all application instances.
W3C DIDs are cryptographic keys (as RDF with schema) that can be generated by users locally or generated centrally; similar to e.g. Bitcoin account address double hashes.
Users can cryptographically sign JSON-LD, YAML-LD, RDFa, and any other RDF format with W3C DIDs; in order to assure data integrity.
How do data integrity and data provenance affect the costs, utility, and risks of knowledge graphs?
Compared to GPG signing git commits to markdown+YAML-LD flat files in a git repo, and paying e.g gh to enforce codeowner permissions on files and directories in the repo by preventing unsigned and unauthorized commits, what are the risks of trusting all of the data from all of the users that could ever write to a knowledge graph?
Which initial graph schema support inference and reasoning; graph rewriting?