• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Software Support

Our GitHub:

TopicMiner


Developers: S. Koltsov and V. Filippov.
Proprietary GUI software for text preprocessing, topic modeling, and visual analysis of the results.
Programming languages: C++, Delphi, and CUDA.
Development time: September 2012 — present time.

Technical features

TopicMiner is based on multiple libraries and uses a proprietary package (AlphaControls/AlphaSkins for Delphi) for graphical user interface (GUI). One of its key features is the capability of handling huge amounts of textual data (tens/hundreds of thousands of text documents, gigabytes of data). Text preprocessing module uses crc32lib (words to crc32 codes transformation). Topic modeling module is based on bigartm, gibbsLDA, hdplib libraries. Currently, the source code of TopicMiner is closed and cannot be opened due to the limitations of its proprietary dependencies.

Modules features

  1. Textual data preprocessing
    • Data importing from CSV and text files;
    • Data parsing and cleaning;
    • Lemmatization of English and Russian texts (via Yandex Mystem);
    • Word frequency calculation;
    • Stop words list generation;
    • Filtering stop words according to word lists and frequencies.
  2. Topic modeling
    • Powerful algorithms: LDA with Gibbs Sampling, ISLDA, GranulatedLDA, BigARTM, PLSA (with regularization);
    • Setting parameters for topic modeling: number of topics, coefficients α and β, number of iterations, number of threads, regularization;
    • Exporting results of topic modeling (distribution of words and documents across topics);
    • Exporting results of topic modeling to the geographic information system QuantumGIS;
    • Calculating the distance matrix between words in documents within a given topic.
  3. Topic models quality visualization and diagnostics
    • Quality calculation and visualization (proportion of words and documents for which the probability is above average);
    • Topics comparison based on Kullback-Leibler divergence and Jaccard index.

Download: TopicMiner (ZIP, 37.41 Mb) 

Quick Guide: TopicMiner Quick Guide (PDF, 2.49 Mb) 

Manual: TopicMiner Manual (PDF, 4.18 Mb)

Publications based on TopicMiner: TopicMiner Publications List (PDF, 65 Kb) 


The information system is implemented as an installer for operating systems: Microsoft Windows 8 and above (64-bit). Access to the program is granted upon request: linis-spb@hse.ru

FakeNews App

Developer: Maksim Terpilowski
Programming languages: Python and JavaScript.
Development time: 2019-2022.

This web application was developed to collect data for our FakeNews project. It is based on client-server architecture and includes standalone website and VK community app versions. Its interface is optimized for all platforms and includes gamification elements. The app shows a number of news items to a user and asks to guess what news items are false or true. To complete a survey and receive a feedback, user should answer several questions on various research-related issues. App also asks a user to provide access to the private data (only for VK, with informed consent).

Based on this application, a special version adapted for our EyePoint project was created. This version includes calibration screens and QR markers for the eye tracker.


FakeNews Application


VKMiner (Social Network)

Access to the program is granted upon request: linis-spb@hse.ru

Developers: S. Koltsov and V. Filippov.
The information system is designed to work with the social network "VKontakte" (VK).
Programming languages: Delphi XE2, SQL
Development time: 02.2013 – present time.

Capabilities:

    • Loading personal user data from the list of IDs.
    • Loading the friends list of a specific user.
    • Loading the groups list of a specific user.
    • Loading the user list of a specific group.
    • Calculating the ego network (Network of friends).
    • Loading raw data for the friends' network.
    • Building the friends' network.
    • Loading data from a user's or group's wall.
    • Loading a list of discussions and the discussions themselves from the wall.
    • Loading 'Discussion'.
    • Loading 'Group Distribution'.
    • Loading 'Random User sampling'.
    • Loading 'Network of friends + wall'.
    • Exporting the loading results in CSV format.
    • Loading 'User parameter profile'.
    • Monitoring the loading process.


DigiFriends App

App developer: Maxim Koltsov
Programming languages: Python и JavaScript.

The application was created for data collection as part of the research project Digital Friends (DigiFriends) and integrated into the VKontakte social network. Therefore, the obtained data pertain to users of the VKontakte social network. The application is based on a questionnaire that includes questions about a variety of user characteristics (see below).

 Psychological and socio-demographic features:

-    The propensity to make social connections [1];

-    Rosenberg self-esteem scale – two items [2];

-    Subjective well-being;

-    Socio-demographic characteristics (gender, age, level of education).

 
Features of online behavior:

-    Privacy attitudes scale [3];

-    Behaviors related to privacy features of VK;

-    Frequency and duration of VK sessions;

-    Goals for using VK.

 Social capital:

-    Online social capital scale [4].

 

The questionnaire consists of a total of 42 questions.

The application incorporates a function to gather data from the user's personal profile on VKontakte: questionnaire data, user friends' IDs, characteristics of user activity on their "wall." All data is collected with the user's consent. Before starting the questionnaire and data collection, users are informed about the specific data being collected within the research project. Data collection begins after the user clicks the "Start" button.


After completing the questionnaire users' are provided by gamesome feedback. 



References:

[1] Totterdell P., Holman D., Hukin A., "Social networkers: measuring and examining individual differences in propensity to connect with others," Social Networks, vol. 30, pp. 283-296, 2008.
[2] Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press.
[3] Stutzman, F., Capra, R., & Thompson, J. (2011). Factors mediating disclosure in social network sites. Computers in Human Behavior27(1), 590-598.
[4] Williams, D. (2006). On and off the’Net: Scales for social capital in an online era. Journal of computer-mediated communication11(2), 593-628.

LINIS-CROWD

LINIS CROWD is a web-based application for crowd-sourced mark up of document collections. The system has been used to develop a Russian language dictionary of sentiment-bearing words from the socio-political domain. 

 

 

You can learn more about LINIS CROWD in this publication [in Russian]:

Алексеева, С. В., Кольцов, С. Н., & Кольцова, О. Ю. (2015). Linis-crowd.org: лексический ресурс для анализа тональности социально-политических текстов на русском языке. In XVIII Объединенная научная конференция «Интернет и современное общество» (IMS‑2015) (pp. 25–34). Санкт-Петербург. http://openbooks.ifmo.ru/ru/file/2203/2203.pdf

The web platform and sentiment dictionary are available at the link. For more information about using databases and software developed by LINIS, please contact linis-spb@hse.ru.


 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.