Machine Learning for Automated Topic Discovery

Quickly discover topics, themes and linguistic patterns in any document collection.

  • Extract the major topics (themes) and their proportions in a set of research projects.
  • Find persons, places, organizations or locations in a document collection.
  • Extract and curate a vocabulary from a document collection, including relevant parts of speech.
  • Use your thesaurus or taxonomy for a particular domain of knowledge to find documents of interest.
  • Build a topic map for your website.
  • Find interesting patterns in a collection of user-defined categories or symbols (e.g., genetic sequences, instrument measurements, music notation).
About Topix

Knowledge Discovery

Topix is a modern web-based knowledge discovery application designed to provide the innovative data analyst, researcher or investigative journalist the tools to share insights quickly and effectively.

Topic Modeling

Topix uses a process called topic modeling, a special case of machine learning, to automatically discover and extract implicit ("hidden") topics from a document collection.

Natural Language Processing (NLP)

Topix includes state-of-the-art NLP algorithms that provide you with the power to further refine your discovery process by focusing on specific parts of speech or special entities of interest such as persons, places, organizations, locations, values and dates.

Web Enabled

By providing this fast online web service we hope to significantly increase experimentation, enable more rapid learning, and eventually build a shared repository of best practices, document collections, and models that can be shared and "mined" in order to accelerate progress in the practical use of topic modeling.

Bridge to Academic Research

We're continually evaluating the latest research for breakthroughs in this field. We hope to provide a useful bridge between the recent and emerging innovations in academic research and the practical challenges that you face every day.

Custom Implementations and Training

We provide custom cloud or on-premise implementations and training for organizations with large document sets, database integration, website harvesting, and internal knowledge sharing. We are committed to re-investing a portion of our revenue from these custom engagements back into the public version of Topix.

Knowledge Matters!

We strongly believe that the discovery and sharing of real knowledge is fundamental to any free and thriving society. We hope Topix will help democratize and spread the understanding and use of topic discovery and exploration.

Topix Explore: Features Overview

Below we provide an overview of the many topic modeling options, reports, and output available using "Explore."

NEW in Version 2.0: Vocabulary by Topic Network

This network visualization is the Topix “Vocabulary by Topic” network. It shows the top vocabulary words for each topic. The thickness of each line represents the relative contribution a vocabulary word provides to each topic. Note that some vocabulary words might apply to more than one topic.

NEW in Version 2.0: Vocabulary to Topic Mappings

A topic is represented by a collection of vocabulary words along with the relative percentage that each contributes to the topic. The smaller the number of words that contribute to 100% of a topic, the more coherent and understandable a topic will be. In this visualization (called a "Sankey" diagram), the top 5 words for each topic (by percentage contribution of each) are selected and presented so that you can quickly see the relative contribution to each topic. You can select and move any word or topic node for easier viewing.

NEW in Version 2.0: Top Vocabulary Words Assigned to Topics

In this visualization, the top 5 words for each topic (by percentage contribution of each) are selected and presented so that you can see if there are major overlaps in vocabulary usage among the topics.

Vocabulary Word Cloud

The top 200 vocabulary items in your document corpus are displayed in a word cloud, with the size proportional to frequency. When you choose to limit parsing to include only a selected part of speech (noun, verb, adjective) or entity (value, date, year, person, place, organization) only words that are selected will be used in the calculation and display.

Corpus Statistics: Quick Summary

This panel provides a birds-eye view of the size and composition of your document collection (corpus).

Corpus Statistics: Charts

This is one of the several chart options that complement the corpus statistics table.

Vocabulary Catalog

Use the Vocabulary Catalog on its own or use it to fine-tune your topic model. Find the prevalence of each vocabulary item in your document collection and jump to the documents containing each word. You can choose to exclude additional words, and then re-run your model with these words excluded. You can continue this fine-tuning as long as you wish.

Topic Details

For each topic we display the top words correlated with each topic, and drilling from each topic to documents that are assoicated with each topic. You can search and filter by word, topic, or percentage.

Advanced Analytics: Word Contributions to Each Topic

Topix includes several powerful visualizations that help you understand at a glance how well each topic is described by the top words generated for each topic. This one shows the contributions for the top 20 words generated for each topic.

Advanced Analytics: Vocabulary Counts by Topic Specificity

At a glance you can discover vocabulary words that have a low count and high specificity. Research has shown these should be considered for removal for better topic conherence.

Standard Options

You can explore using our sample data sets or process your own document sets. Choose the number of topics to explore and experiment with the number of iterations to perform.

For English text corpora, we leverage state-of-the-art Natural Language Processing (NLP) options for parsing text. You can choose to only include, nouns, verbs, adjectives or persons, places, and organizations.

Advanced Options

You can explore using our sample data sets or process your own document sets. Choose the number of topics to explore and experiment with the number of iterations to perform.

There are powerfule advanced options for fine-tuning your topic modeling exploration.

Topix Session Summary

This panel captures key information related to each topix session.

Vocabulary by Topic

Filter or sort by any combination of topic, word or token count to develop a detailed understanding of the composition of your corpus.

Download Options

You have a wide variety of export options for further analysis, visualization, and exploration. For example, export your updated exclude list (stopwords) to use in future explorations. Other options include topic summaries, topic correlations, topic distribution by document, word distributions by topic, and word prevelance measures.

Recommended Resources
Get in touch

Agile Innovations, LLC

Owen Dall Sotomayor, CEO/CTO


Annapolis, MD, U.S.A.

BAON Enterprises LLC, Global Distributor

Scott Gornall, VP of Sales


Annapolis, MD, U.S.A.