Open Source Compass uses data from the GH Torrent project, a research initiative led by GH Torrent monitors the GitHub public event timeline and retrieves and stores the contents and dependencies of each event.
A full export of the database that powers this site can be downloaded here (3GB zip file), and is made available under a CC BY-NC-SA 2.0 license. All requests for commercial use should be submitted to [email protected].
Open Source Compass analyzes only those projects with greater than 10 watchers, comprising close to 1 percent of the more than 100 million GitHub projects (as of April 2019). Project metadata (e.g., names, descriptions, stars, forks, commits) is the primary data used in the analysis and development of Open Source Compass.
Almost 50 percent of the projects in Open Source Compass have location information. Available data was homogenized to the city or regional level. When not available, we used subnational data to ascribe a project to a geographic region. Region and country data was standardized using a geocoding API that follows the ISO 3166 standard for defining countries and their subdivisions.
Programming languages are based on the repository language classification used by GH Torrent and GitHub which includes frameworks and tools that are not formal programming languages (e.g., Jupyter Notebook, Smarty).
Projects were grouped into fifteen technology domains using a combination of keywords and GitHub project tags. Analysis was also performed on the initial 1,000 projects identified for each domain. Additional domains may be added in the future.
List of domains
- Artificial Intelligence
- Artificial intelligence spans across the machine intelligence spectrum from robotic process automation which automates low-complexity, rules-based tasks to artificial general intelligence which fully replicates human intelligence including independent learning and decision-making.
- Augmented & Virtual Reality
- AR overlays digitally created content into the user’s real-world environment. Features include transparent optics and a viewable environment in which users are aware of their surroundings and themselves. VR immerses users in manufactured surroundings that depict actual places or imaginary worlds.
- Blockchain allows for the secure management of a shared ledger, where transactions are verified and stored on a network without a governing central authority. Blockchains can come in different configurations, ranging from public, open-source networks to private blockchains that require explicit permission to read or write.
- Cloud computing gives companies the ability to tap into a nearly unlimited scale of computing power, storage, platforms and software often cheaper and faster than on-premises solutions. It is a foundational element for companies of all sizes and maturity levels, enabling efficiency and innovation.
- Cyber Security
- Cyber security is the protection of systems, networks, data, sensitive and personal information against security breaches.
- Data Visualization
- Data visualization refers to the innovative use of images and interactive technology to explore large, high-density datasets. It helps users to see, explore and share relationships and insights in new ways.
- Internet of Things
- The Internet of Things (IoT) is a suite of technologies and applications that equip devices to generate all kinds of information—and to connect those devices for instant data analysis and, ideally, “smart” action.
- Machine Learning
- Machine learning refers to the ability of computer systems to improve their performance by exposure to data without the need to follow explicitly programmed instructions. At its core, machine learning is the process of automatically discovering patterns in data which can be used to make predictions.
- A microservice comprises small, independent processes that communicate using language-agnostic application programming interfaces (APIs). Both an architectural pattern and a mechanism, microservices are created by decomposing an application to the functional primitive and reconstructing it.
- Mobile Development
- The act or process of creating applications specifically for mobile use on devices such as phones and tablets.
- Physical Robotics
- Integrating cognitive technologies such as computer vision with tiny, high-performance sensors, actuators, and cleverly designed hardware to create a new generation of robots that can work alongside people and flexibly perform many different tasks in unpredictable environments.
- Quantum Computing
- Quantum computing harnesses the unique properties of quantum mechanics to process information. These properties help solve problems that are either too complex or too time-intensive to solve with classical computers and algorithms.
- Serverless Computing
- Serverless computing is a cloud-computing execution model in which a cloud provider dynamically manages the allocation of machine resources, rather than on pre-purchased units of capacity.
- Text Analytics
- Text analytics is the practice of using technology to gather, store and mine textual information for hidden signals that can be used to inform business decisions.
- Web Development
- The act or process of creating a web site for the internet or a private network.
To classify projects into regions and countries we standardized the location information provided by project contributors using a geocoding API that matches latitude and longitude to regional political units. The API follows the ISO 3166 standard for defining countries and their subdivisions.
We identified related projects using a collaboration network that counts pull requests between project owners. For example, if the owner of Project A contributed to Project B, then those projects are connected (a bipartite network). We selected a sample of a project’s connections (an ego network) and chose related projects based on the weighted centrality by number of commits and the total number of connections (degree) of those projects.
We analyzed pull request commits between projects to find patterns of collaboration across technology domains. We filtered out contributions within the same domain and only considered statistically significant connections based on the phi coefficient.
To measure the spread of open source across countries we compared the projects created in a given country with the location of the projects’ watchers (users that request to receive notifications of project activity). We only considered statistically significant connections based on the phi coefficient.
The Open Compass API allows users to explore the entire database using query strings that return data as JSON results. Site visualizations have a “view data” button at the top-right that displays the API call used to generate that visualization.
Full documentation of the available API endpoints is available at api.opensourcecompass.io/apidocs.