Apache Superset is free

Introduction to Apache Superset: A Chart

Transcript

1 OpenSchoolMaps.ch Free learning materials for free geodata and maps Introduction to Apache Superset: A chart A worksheet for interested parties and teachers Overview of learning objectives This introduction has the following learning objectives: You know the most important concepts and basic functions of Apache Superset You can select data sources You can use Visualize charts You can arrange charts in a dashboard. You can share dashboards with others (via the web). Editing this worksheet takes about half an hour, depending on your previous knowledge. No programming knowledge is required for this introduction; Basic knowledge of spreadsheets is sufficient for now. For the exercises in this worksheet, you need access to an Apache Superset service (see below), a standard web browser (Superset works best on Chrome) and an Internet connection. These instructions refer to Release 0.34 of Apache Superset, which was released in September 2019. Introduction Good visualization is important for the success of a decision-making process. That is why the findings of a concern, project or survey must be visualized. If findings are presented clearly and comprehensibly, this increases their comprehensibility and acceptance. Visualization can also be used not only for illustration, but also for data analysis. Often you can only recognize connections in the data through a skilful presentation. We humans are bad at comparing numbers; On the other hand, we recognize graphic patterns well. Visualization therefore not only shows the data graphically, but can also be used as a separate technique for data analysis. The Internet also enables the 1st

2 Publication of the visualization and thus easy communication with customers and work colleagues. Apache Superset is such a data visualization and publication tool. Some also call this application "Business Intelligence Tool". Superset is also a tool for sharing data sources, i.e. from table data to geospatial data. It can be connected to a wide variety of databases. There is official, English-language documentation for Apache Superset. Unfortunately, this seems to be very thin and not tracked: It uses outdated terms ("slices" instead of "charts") and the tutorial it contains requires weather data that is not preinstalled. The most useful part of the documentation is the small FAQ Frequently Asked Questions: Create / log in user account Apache Superset is a web application and an account is required for access, depending on the server. If you already have access to an Apache Superset instance, log in there. Apache Superset knows different user roles that have certain rights to change data or call functions. This worksheet assumes that you have been given user rights that correspond to the "Gamma" role, i.e. you can create charts and dashboards. Concepts and terms After logging into an Apache Superset application (see previous chapter) you will see the menu above and, among other things, the menu items Sources, Charts and Dashboards. Here are some explanations of the concepts behind Apache Superset: Datasource (data source): A "source" can be generated, for example, by uploading a CSV file or by creating views using SQL queries. Database: A connection to a database system that contains tables and views. Chart, formerly called a "slice": A "chart" is a list, diagram or web map. Superset currently knows 48 different charts, seven of which are interactive web maps ("Map Charts"). Dashboards: An interactive website that presents interactive charts. Metrics: "Metrics", sometimes also called "Measurements", are numerical measures. They are mainly mentioned and required in the charts. Records: Data sources and charts are all programming elements, sometimes referred to as "records" in the user interface. SQL query: statement, database query in the database language SQL. SQL queries can be 2

3 Save and make available to others as a data source. In the "Sources" menu of Apache Superset and in the original documentation of Apache Superset, Druid is mentioned as the database system. You can ignore all information related to Druid in the menu and in the documentation. Data and questions The data to be visualized with Apache Superset (and business intelligence tools in general) must be available in a structured and clean form. If necessary, the data must be prepared with database systems (SQL), spreadsheet programs (e.g. MS Excel, LibreOffice) or GIS (e.g. QGIS) (see e.g. OpenSchoolMaps> "Introduction to QGIS 3 and Geographic Information Systems (GIS)"). Helpful to clean up data ("data wrangling"), can also e.g. Be OpenRefine. Only one table is used in this worksheet, the wb_health_population table from the World Bank (source, license CC BY-4.0, status approx. 2017, roughly translated as "World Bank-Health-Population"). The table wb_health_population has approx. 328 columns (attributes), i.e. very many. We use the following columns: Name of the country: country_name World region in which the country is located: region Year of data collection (): year Total number of people: SP_POP_TOTL Figure 1 shows the data that we will use. Please take a moment and take a close look at these data. Understanding what data is in which column is a necessity in order to be able to create meaningful and correct charts. 3

4 Figure 1. Data from Switzerland from the table wb_health_population. Charts Apache Superset currently offers 40 charts. In the appendix you will find further example diagrams for the table 'wb_health_population', which are presented in additional worksheets (see chapter "Conclusion"). By the end of this guide you will have created a chart and filter. (Figure 2) 4

5 There is a filter on the left. This allows you to limit the data sets to certain regions and countries. On the right there is a pie chart, which shows the ten most populous countries and their population figures (metric). Figure 2. The dashboard with the pie chart and filter. If the charts are not displayed correctly during creation, this can be partially resolved by reloading the website (F5). Task 1: My first chart First you have to choose a data source. The various tables can be viewed under. Here you can click on the magnifying glass to get more detailed information about the table, e.g. which columns the table has and which data types can be saved there. In addition, under Columns you can set whether the column can be timed or filtered and whether it can be grouped according to it. To query a table, you have to select the menu item "Tables" under "Sources" and then click on the name of the table. In this example it is wb_health_population. You can only select one table as a source. If you want to link several tables to a data source, you need SQL knowledge. Additional worksheets on OpenSchoolMaps show how to do this. Selecting a table opens a new window or a new browser tab in which you can now create a chart. The selected table wb_health_population is now located there at Datasource. Directly below, under Visualization Type (chart type), "Pie Chart" can be selected instead of Table. First you have to adjust the time span under Time, as the data in the wb_health_population table start in 1960 and end in 2014. Since we only want to consider the year 2014 in our example, we use a custom filter from

6 In order to receive all data - no matter what year it belongs to - or if there is no data that is time-dependent, you can select No filter. To e.g. to select the year 2014, you can also type in 2013 to 2014. Under GROUP BY you can now define your first query by selecting a column such as SP_POP_TOTL (Population Total) under Metric and taking the total. You can do this by clicking on the text field under Metric and selecting an aggregate (these are the entries in front of which AGG is) followed by the corresponding column. The Metrics column must always be filled out. It is also used in all representations to determine how much weight this line has. In our pie chart, it is the size of the section. Under GROUP BY Group by you can subdivide the query even further. Group by is used to divide a total according to a certain attribute. If e.g. If the sum of the world population is grouped by country name, you get the population per country. Therefore select the attribute country_name here. The preceding "ABC" indicates the data type of this attribute, which is not important at this point. Under Row Limit you can limit the result to a certain number of entries. For example, if you If you set a row limit of 10, only the 10 most populous countries are displayed. If you press Run Query now, the query will be executed. It shows the ten most populous countries in a circle diagram. Now save this chart as your first result in Apache Superset, e.g. named "Population Pie Chart". You will need this chart again later. 6th

7 Solution The query should look like this: Task 2: Filter box for the dashboard Now that we have a chart, it would be useful to have a filter that can be used in the dashboard afterwards. All you have to do is select the filter box as the visualization type and select the appropriate columns under Filters, e.g. region and country_name. In addition, the Date Filter option can be deselected for the filter. Of course, you have to save this filter so that you can use it later. 7th

8 Solution The query should look like this: Arrange charts in a dashboard Under you can now create your own dashboard by pressing the + in the upper right corner. Your chart and your filter can now be presented in this dashboard. When creating a dashboard, you can set a lot, but it is often enough to just fill in the Title field. 8th

9 To start editing, click on Edit dashboard in the upper right corner. Your dashboard is currently empty, but you can simply fill it with drag & drop. You have to pull the first component up to the edge. When a component can be placed, this is indicated by a blue line, which also shows how / where the component will be placed. First you click on the Edit dashboard button which shows all the components that you can add. Tabs are to be understood as with the browser itself and can contain anything. Rows and columns can be used to connect other individual components. When components are connected, the space between them can also be colored white. By hovering over the connected components, a button appears showing this option. Divider can be used to divide the connected components into groups again. These represent a line. Headers can be used to give the individual components a heading. Markdown can be used to write a descriptive text for a chart. All the charts that have been created so far can be found under Your charts & filters. After adding a component, its size can be adjusted by clicking on the lower right corner. Task 3: Create a dashboard Now create a dashboard that corresponds to Figure 3. Figure 3. Here again the dashboard as the result of this worksheet with the pie chart and the filter Dashboards share with others (via the web) As soon as you are satisfied with your dashboard, you can publish and / or share it. To share a dashboard, you have to click on the dropdown menu behind the Edit dashboard button and then on Share dashboard. You can now send the URL to another person, but this person (i.e. their role) must also have access to the data, otherwise 9

10 only an error is displayed. To publish a dashboard, all you have to do is press "Draft" to the right of the dashboard's name and it will display "Published". A published dashboard is visible to other users under (provided that these users have access to the corresponding datasources). A dashboard that is marked as "Draft" is not visible to other users (exception: high-ranking users such as admins), but it can still be accessed directly via a URL. Passed final exams! You should now have a World Bank data dashboard that you can show to others. Tip for the filter: In a time filter under Custom it is possible to write years directly. The date is then automatically the first of January. If you want to learn more about Apache Superset, we recommend the more detailed information sheets "Introduction to Apache Superset (7 Charts)" and "Apache Superset for advanced users" on OpenSchoolMaps. Figure 4. Charts presented in the Apache Superset Introduction (7 Charts). There are also courses on Apache Superset, e.g. at the Geometa Lab HSR. We are happy to receive feedback, see OpenSchoolMaps> Other teaching ideas. Freely usable under CC0 1.0: 10