Posts Tagged ‘text analytics’

ISC Consulting Powers Pytheas AI with BrainDocs

Friday, June 24th, 2016

We are pleased to publish ISC’s submission under the DIUx program.  The new  “Defense Innovation Unit Experimental (DIUx) serves as a bridge between those in the U.S. military executing on some of our nation’s toughest security challenges and companies operating at the cutting edge of technology.” Powered by ai-one’s Nathan ICE artificial intelligence core for language, ISC’s Pytheas AI will provide ISC with the technology to assist researchers and help our governments keep us safe.  Some proprietary sections have been deleted in the version below.

ISC White Paper for DIUx Technology Area of Interest: Knowledge Management

By Jeremy Toor, ISC Consulting Group

Executive Summary

ISC/ai-one develops a prototype using Pytheas artificial intelligence (Pytheas AI) which will provide automated intelligent information management, or knowledge management (KM) of multiple data sources.  Pytheas AI fingerprints the flow of data from almost any source including chat, email, message traffic, and other data.   This AI core then supports the user in a publish/subscribe architecture, with building knowledge from the fingerprinted data through queries and intuitive alerts that understand the difference and importance of contextual situations.

The abundant quantity of data that is available to users, analysts, and commanders today can make it challenging to build a concise and accurate picture from which dynamic assessments can be made. Both Command and Control (C2) and intelligence systems are largely data-centric. Users that are required to make strategic and tactical decisions will benefit from a task-centric user experience that is able to manage information as it is created and presented, and distil many sources of data into a manageable data flow.  This user experience, facilitated by Pytheas AI will deliver an KM Engine that can accelerate the decision making process.

Through Pytheas AI the user will be presented with data that has gone through automated processes to be categorized, tagged, and ranked according to its value in the current context of operations.  Pytheas AI will give the user flexibility to tailor their focus area and pull information from a wide breadth of sources as they build situational awareness and confidence to take action.


ISC/ai-one proposes a three phase project.  Phase 1 will include one-week for initial installation, configuration and user training. Phase 2 will include a five-month period to support data ingestion, intelligent agent training and dashboard customization. Phase 3 will include a three-week evaluation and close out.


Pytheas AI is built with an artificial intelligence core to collect, organize and analyze language to uncover key links and patterns within large volumes of unstructured text.  The application empowers analysts to find the relationships necessary to discover, manage, process and exploit data.  Key features and attributes of Pytheas include:

  • Discovery of Concepts through the use of Intelligent Agents
  • Agent collections can be built from existing plans, roadmaps and strategy documents
  • DoD analysts can use common KM collections or build and share concept agents
  • Agents provide classification for query and tagging of documents
  • Application core is language independent
  • Fast and lightweight running on PC class machines or VMs

Pytheas AI is built upon ai-one’s BrainDocs software application (with NathanICE API core) which is a commercially ready and viable technology that has been applied to several use-cases similar to the requirements in the technology area of interest, knowledge management, that DIUx is seeking.  Our prototype for KM is ready for demonstration using sample data.

Pytheas uses the ability of ai-one’s proprietary NathanICE API to discern patterns in the words and associations that are central to the meaning of all or a portion of a text document (in the same way as the brain).  Nathan extracts these keywords and associations, filtering out the noise to create a proprietary fingerprint array of the concept that can be used in many ways.

Pytheas uses the fingerprint of a trained concept to find (rank) similar concepts within a corpus of information (documents, websites, databases) and returns paragraph-level results sorted by “similarity”. These results support a variety of workflows in enterprise compliance, classification, search and knowledge management.  Agent similarity scores are exported to Excel or your database to support analytics and BI tools. This can be done by the analyst for small ad hoc studies.  Agents can also be used to code years of legacy data without additional training.

Users employ agents in Pytheas AI to organize text based on contextual ideas and metadata dimensions, improving accuracy, consistency and saving substantial amounts of time in this tedious process.

The Basic Elements of Pytheas

Documents – Pytheas is capable of analyzing any form of unstructured text. In fact, our technology works best with semantically-rich content written in your business vernacular without external taxonomies or ontologies.  Working at the paragraph level it has been used on everything from text messages to database fields to long documents always with full traceability to source.

Conceptual Fingerprints – This is the “secret sauce” of our discovery capabilities. Pytheas uses the Nathan API keywords and associations to create semantic “fingerprints” of concepts. Because one concept can be written in multiple ways, our algorithm does not rely on word counts, natural language processing (NLP) or latent semantic analysis (LSA) when identifying and fingerprinting concepts.

Intelligent Agents – Pytheas agents examine and compare the conceptual fingerprints to find traces of concepts buried within your data. Our premise is that analyst is the expert and needs to be able to train their own army of software agents to “read” documents and deliver the relevant paragraph. Used as a collection, the scores from a collection of agents set the context for a user’s query.

Paragraph Level Concept Discovery – Pytheas provides the ability to categorize and display concept results at the paragraph-level. Users do not need to hunt through documents trying to find a concept that a search engine claims to be present. Our system will return the paragraph(s) that closely match a concept, sort and group the concepts by similarity to one another. Paragraphs can be evaluated and traced back to their source document for reporting and distribution.

Topic Mapper Entity and Sentiment

Figure 1. Topic Mapper Entity and Sentiment in SEC Filings

Ease of Integration – Pytheas application can be used with conventional desktop tools for ad hoc projects.  For workflow automation a Restful API provides developers an easy method to process documents and export results to SQL or other DBs for reporting and visualizations.

Optional Entity Extraction and Sentiment (Figure 1 above) – Complementing paragraph level concept detection is the ability to extract entities and/or score for sentiment so this information can be added to visualizations and follow on workflows.  Clients can use their own technology for this purpose or add custom analytics to further refine the insight for social network analysis, tagging existing file headers or streamlining the flow of information into the analyst.

Defense Utility

The immediate benefit to DoD is increased productivity, consistent analysis and more effective information management.  The long-term benefit is an ability to perform quicker, more informed decisions.

Operational users of this prototype include any person that has to search through data.  This includes anyone using SharePoint and other common organizational databases.  Analysts who must sift through massive amounts of data in order to discover relevant information will save countless hours through the employment of our prototype.   Through the employment of a similar use case at NASA, our customer was able to complete a typical six-week project in one-week!

Company and Relevant Use Case

Lead by ISC, personnel from ISC Consulting and ai-one inc. will execute the project.

ISC Consulting Group is a Service Disabled Veteran-Owned Small Business (SDVOSB). We are headquartered in Sierra Vista, Arizona, with operational offices at Ft. Huachuca, AZ; Orlando, FL; Ft. Gordon, GA; and Northern Virginia. ISC provides a full-spectrum of services, products & solutions supporting the DOD Intelligence Community and key commercial clients with advanced capabilities in Instructional Solutions, Cyber Security, Command and Control planning and operations, Intelligence operations, Information Technology, and Data Analytics through Artificial Intelligence products and services.

ai-one inc. is the developer of a proprietary core technology that emulates the complex pattern recognition functions of the human brain that can detect the key features and contextual meaning of text, time-series and visual data.  This technology will enable DIUx to score and analyze any piece of textual content and discover information by concept, bringing the dimension of AI understanding to knowledge management. This technology automatically generates a lightweight ontology that easily detects all relationships among data elements; solving the immediate problems facing the DIUx knowledge management based process and schedule.

Existing Customers

ISC has served several clients with Pytheas technology, including NASA Marshall Space Flight Center (MSFC).   Currently, Pytheas is being used by MSFC’s Advanced Concepts Office (ACO) under a Cooperative Agreement to assist in technology roadmap development and separately by the Office of Strategic Analysis and Communication (OSAC) to manage and report on their portfolio of project investments (similar to SBIR grants).   For example, the roadmap project is described below:

Overview of the NASA Advanced Concepts TAPP Pilot Project

The Advance Concepts Office (ACO) at MSFC, NASA is developing and refining methods and processes for performing Information Based Decisions for Strategic Technology Investments.  This system is currently referred to as TAPP, Technology Alignment & Prioritization Process.   This process supports the evaluation of the technologies for investment by NASA and MSFC to insure alignment with NASA mission plans, technology area priorities and strategic knowledge gaps.

TAPP creates an interactive system for exploring the almost mind boggling complexity of planning for multiple missions using over 400 technologies (many still in basic research) and hundreds of interrelated elements/sub-elements over 30-year planning horizons.

Pytheas provides NASA the capability to have data mining agents parse and score unstructured content against the nearly 400 technologies identified in the 15 Technology Roadmaps.  This ability to score proposals with agents allows ACO to perform statistical analysis within the Information Based Decision framework for Strategic Investments.

The immediate benefit to ACO is increased productivity and consistent analysis. The long-term benefit is an ability to perform quicker, more informed technology assessments, feasibility analysis, and concept studies that align with NASA evolving strategic goals and multiple mission objectives.


Given a six-month prototype build period, ISC/ai-one will demonstrate to DIUx that ISC/ai-one’s Pytheas AI application will enable the organization to save critical time and human capital in the implementation and operation of knowledge management systems.  Pytheas will empower the IC to rapidly and effectively sort through vast volumes of text data in order to gain knowledge and position decision makers with the right information to achieve stated organizational analytical research outcomes.

Big Data Just Got Smaller: New Approach to Find Information

Tuesday, November 15th, 2011

Press Release

For Immediate Release


ai-Fingerprint shows a graphical representation of the knowledge within a news article

San Diego, CA – Artificial intelligence vendor ai-one will unveil a new approach to graphically represent knowledge at the SuperData conference in San Diego on Wednesday November 16, 2011. The discovery, named ai-Fingerprint, is a significant breakthrough because it allows computers to understand the meaning of language much like a person. Unlike other technologies, ai-Fingerprints compresses knowledge in way that can work on any kind of device, in any language and shows how clusters of information relate to each other. This enables almost any developer to use off-the-shelf and open-source tools to build systems like Apple’s SIRI and IBM Watson.

Ondrej Florian, ai-one’s VP of Core Technology invented ai-Fingerprints as a way to find information by comparing the differences, similarities and intersections of information on multiple websites. The approach is dynamic so that the ai-Fingerprint transforms as the source information changes. For example, the shape for a Twitter feed adapts with the conversation. This enables someone to see new information evolve and immediately understand its significance.

“The big idea is that we use artificial intelligence to identify clusters and show how each cluster relates to another,” said Florian. “Our approach enables computers to compare ai-Fingerprints across many documents to find hidden patterns and interesting relationships.”

The ai-Fingerprint is the collection of all the keywords and their associations identified by ai-one’s Topic-Mapper tool. Each keyword and its associations is a coordinate – much like what you would find on a map. The combination of these keywords and associations forms a graph that encapsulates the entire meaning of the document.

The real-world applications are impressive. “It solves a lot of so-called Big Data problems because the system learns by itself,” said Olin Hyde who worked with Florian on the project. “ai-Fingerprints work with existing computer languages and standards. So it only took us about a week to create a generic tool, called BrainBrowser, to find relationships in complex texts – such as summarizing news articles, searching for a job, or identifying new uses for a drug.”

To build BrainBrowser, the team fed ai-Fingerprint results from Topic-Mapper into a natural language processing tool, OpenNLP, so that the computer could understand the rules of grammar then tag parts of speech, chunk phrases and classify words into categories (also called named-entity recognition). The ai-Fingerprint is continuously updated by Topic-Mapper so that the computer can understand how information changes over time – as it does in a human conversation.

Next, the team built a little tool in Java that converted the output into a continuous data feed using an open-standard format called XGMML. This format shares the knowledge of a document as a network of words, sentences and relationships.

Finally, they visualized the result with an open-source bioinformatics tool, called Cytoscape, to show the differences, similarities and identify anomalous information among documents. The result is a graphic representation of knowledge that can show clusters, extract summaries and compare many documents at the same time.

The approach is easy for others to replicate with other technologies. “We used Topic-Mapper with Java, OpenNLP and Cytoscape,” said Florian, “But you could easily do this with Python, MATLAB and NLTK. Heck, you could throw a voice recognition tool on it, like Dragon or Nuance, and you can build an intelligent agent just like SIRI.”

ai-Fingerprint works in any language because Topic-Mapper looks only at byte-patterns. “The approach can give false positives if you don’t teach it the rules of language” warned Florian, “but it is very accurate once it learns the grammar from an outside source of information – such as a natural language processing system or an external database.”

ai-one’s engineering team sees ai-Fingerprints as a way to make it easier, faster and less expensive for their partners to develop intelligent systems. The team is now testing it for applications in advertising, financial analysis, medical research and search engine optimization (SEO).

“Our mission is to make powerful AI available to all developers. This is a big step in that direction,” said ai-one’s chief operating officer Tom Marsh. “We are eager to find academic and consulting partners who can build upon what we started.”

“BrainBrowser is just a minimally viable product (MVP) to prove the concept,” added Hyde. “The sky is the limit for those that want to build commercial applications. Just take the MVP code and customize to your needs.”

A demo of the system can be seen on and the semsys YouTube channel.  ai-one intends to provide the source code for ai-Fingerprint as part of its Topic-Mapper software development kit.