PyData Global 2024

Paco Nathan

Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics.

Paco advises Kurve.ai, EmergentMethods.ai, KungFu.ai, DataSpartan, and Argilla.io (acq. Hugging Face), and is lead committer for the pytextrank and kglab open source projects. Formerly: Director of Learning Group at O'Reilly Media; and Director of Community Evangelism at Databricks.

The speaker's profile picture

Sessions

12-05
14:30
30min
Catching Bad Guys using open data and open models for graphs
Paco Nathan

GraphRAG is a popular way to use KGs to ground AI apps. Most GraphRAG tutorials use LLMs to build graph automatically from unstructured data. However, what if you're working on use cases such as investigative journalism and sanctions compliance -- "catching bad guys" -- where transparency for decisions and evidence are required?

This talk explores how to leverage open data, open models, and open source to build investigative graphs which are accountable, exploring otherwise hidden relations in the data that indicate fraud or corruption. This illustrates techniques used in production use cases for anti-money laundering (AML), ultimate beneficial owner (UBO), rapid movement of funds (RMF), and other areas of sanctions compliance in general.

This approach uses Python open source libraries, e.g., the KùzuDB graph database and LanceDB vector database. For each NLP task we use state-of-the-art open models (mostly not LLMs) emphasizing how to tune for a domain context: named entity recognition, relation extraction, textgraph, entity linking, as well as entity resolution to merge structured data and produce a semantic overlay that organizes the graph.

Data/ Data Science Track
Data/ Data Science Track