Digital History: The Story So Far

As the field of Digital History continues to grow, so too does the number of tools, software, and coding packages built to support and advance digital history in practice. The range of this is at times staggering: from applications suitable for the most novice of digital historians, to coding guides and tools for those working to more nuanced and specific end-goals, researchers have an ability to engage with their materials in digital, quantitative ways on a never before seen level. Often we focus primarily on the new findings that come of out this new way of approaching research - but what about the ways we get to those findings?

Regardless of the type of digital analysis being performed or even the software being used, the process is normally the same: input some data, click some buttons or run some code (perhaps a couple of times over to edit the code and adjust the outcomes), and get your end result.

You've got an outcome - but do you know how you've got from a to b? It's likely that variables have been written over several times along the way, and the data has changed from one type to another, been filtered or added to, and decision after decision has been made without necessarily knowing it. Each little adjustment or re-run of the code has contributed to the research process and is critical to the end output or findings.

But how do we keep track?

Hello kiara.

Introducing kiara, a new data orchestration tool.

This new tool incorporates a number of different digital research approaches, but most importantly documents and encourages users to critically reflect on the process and use of DH tools. In doing so, the software opens up the black box of digital research, moving away from button-clicking software and making digital research more transparent and open to commentary, replicability, and criticism. It not only makes the research process itself more open, allowing users to visualise and examine the individual steps from start to finish, but also allows them to track changes to the data itself, something that is either imperceptible or, perhaps more importantly, forgotten about in traditional digital history methods and tools. kiara therefore acts as a 'wrapper' to this digital reserach process, tracking and documenting the steps and changes to the data, producing a veritable map of the journey that can be reflected upon and shared.

This tutorial will walk you through installation of kiara in Jupyter Notebooks, and some basic but essential functions that can be built on in further notebooks. At the end, it will showcase the data lineage, having tracked the research process and changes to the data from start to finish.

This tutorial requires you to know python and SQL.

Installation

Before running this notebook, you need to install Kiara and its dependencies in a virtual environment (such as Conda) by running the following command in your terminal:

pip install git+https://github.com/DHARPA-Project/kiara_plugin.dh_tagung_2023

Running kiara

In order to use kiara, we need to create a KiaraAPI instance. An API allows us to control and interact with kiara and its functions. In kiara this also allows us to get more information about what can be done (and what is happening) to our data as we go. For more on what can be done with the API, see the kiara API documentation here.

from kiara.api import KiaraAPI

kiara = KiaraAPI.instance()

Now we have an API in place, we can get more information about what we can do in kiara. Let's start by asking kiara to list all the operations that are included with the plugins we just installed.

kiara.list_operation_ids()

['assemble.network_data.from.files', 'assemble.network_data.from.tables', 'compute.modularity_group', 'create.betweenness_rank_list', 'create.closeness_rank_list', 'create.cut_point_list', 'create.database.from.file', 'create.database.from.file_bundle', 'create.database.from.table', 'create.degree_rank_list', 'create.eigenvector_rank_list', 'create.network_data.from.file', 'create.table.from.file', 'create.table.from.file_bundle', 'date.check_range', 'date.extract_from_string', 'download.file', 'download.file_bundle', 'export.file.as.file', 'export.network_data.as.csv_files', 'export.network_data.as.graphml_file', 'export.network_data.as.sql_dump', 'export.network_data.as.sqlite_db', 'export.table.as.csv_file', 'extract.date_array.from.table', 'file_bundle.pick.file', 'file_bundle.pick.sub_folder', 'import.database.from.local_file_path', 'import.file', 'import.file_bundle', 'import.local.file', 'import.local.file_bundle', 'import.network_data.from.local_file_paths', 'import.table.from.local_file_path', 'import.table.from.local_folder_path', 'list.contains', 'logic.and', 'logic.nand', 'logic.nor', 'logic.not', 'logic.or', 'logic.xor', 'network_data.extract_largest_component', 'onboard.gml_file', 'onboard.zenodo_record', 'parse.date_array', 'query.database', 'query.table', 'string_filter.tokens', 'table.pick.column', 'table_filter.drop_columns', 'table_filter.select_columns', 'table_filter.select_rows']

Downloading Files

Great, now we know the different kind of operations we can use with kiara. Let's start by introducing some files to our notebook, using the download.file function.
First we want to find out what this operation does, and just as importantly, what inputs it needs to work.

kiara.retrieve_operation_info('download.file')

Author(s)           Markus Binsteiner   markus@frkl.io Context           Tags         onboarding                                                                                                
  Labels     package: kiara_plugin.onboarding                                                                          
  References source_repo: https://github.com/DHARPA-Project/kiara_plugin.onboardingdocumentation: https://DHARPA-Project.github.io/kiara_plugin.onboarding/Operation details Documentation   Download a single file from a remote location.                                                         
   The result of this operation is a single value of type 'file' (basically an array of raw bytes),       
   which can then be used in other modules to                                                             
   create more meaningful data structures.                                                                
  Inputs         field name  type    description                                   Required  Default             ──────────────────────────────────────────────────────────────────────────────────────────────────    
  url          string   The url of the file to download.               yes        -- no default --     
  file_name    string   The file name to use for the downloaded        no         -- no default --     
            file.                                                                          
  Outputs        field name         type  description                                                              ──────────────────────────────────────────────────────────────────────────────────────────────────    
  file                file   The downloaded file.                                                      
  download_metadata   dict   Metadata about the download.

So from this, we know that download.file will download a single file from a remote location for us to use in kiara.
We need to give the function a url and, if we want, a file name. These are the inputs.
In return, we will get the file and metadata about the file as our outputs.

Let's give this a go using some kiara sample data.

First we define our inputs, then use kiara.run_job with our chosen operation, download.file, and save this as our outputs.

inputs = {
        "url": "https://raw.githubusercontent.com/DHARPA-Project/kiara.examples/main/examples/data/network_analysis/journals/JournalNodes1902.csv",
        "file_name": "JournalNodes1902.csv"
}

outputs = kiara.run_job('download.file', inputs=inputs)

Let's print out our outputs and see what that looks like.

outputs

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │                                                                                                                                          │ │ field              value                                                                                                            │ │  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │ download_metadata │ │   dict data     {│ │"response_headers": [│ │    {│ │"connection": "keep-alive",│ │"content-length": "7436",│ │"cache-control": "max-age=300",│ │"content-security-policy": "default-src 'none'; style-src 'unsafe-inline'; sandbox",│ │"content-type": "text/plain; charset=utf-8",│ │"etag": "W/"641ae85d69e5836d27ea8906aba0a33b48b0f3ed0ed4c40d21a07fccebdd238d"",│ │"strict-transport-security": "max-age=31536000",│ │"x-content-type-options": "nosniff",│ │"x-frame-options": "deny",│ │"x-xss-protection": "1; mode=block",│ │"x-github-request-id": "5428:E056:DE6D8:E58DB:6527E802",│ │"content-encoding": "gzip",│ │"accept-ranges": "bytes",│ │"date": "Thu, 12 Oct 2023 12:35:15 GMT",│ │"via": "1.1 varnish",│ │"x-served-by": "cache-fra-eddf8230131-FRA",│ │"x-cache": "MISS",│ │"x-cache-hits": "0",│ │"x-timer": "S1697114115.825867,VS0,VE179",│ │"vary": "Authorization,Accept-Encoding,Origin",│ │"access-control-allow-origin": "*",│ │"cross-origin-resource-policy": "cross-origin",│ │"x-fastly-request-id": "aa72f9a416f0a1fb259a8f6a986059d83a8119ff",│ │"expires": "Thu, 12 Oct 2023 12:40:15 GMT",│ │"source-age": "0"│ │    }│ │  ],│ │"request_time": "2023-10-12T12:35:14.923561+00:00"│ │}│ │   dict schema   {│ │"title": "dict",│ │"type": "object"│ │}│ ││ │ file               Id,Label,JournalType,City,CountryNetworkTime,PresentDayCountry,Latitude,Longitude,Language                      │ │ 75,Psychiatrische en neurologische bladen,specialized: psychiatry and                                            │ │ neurology,Amsterdam,Netherlands,Netherlands,52.366667,4.9,Dutch                                                  │ │ 36,The American Journal of Insanity,specialized: psychiatry and neurology,Baltimore,United States,United         │ │ States,39.289444,-76.615278,English                                                                              │ │ 208,The American Journal of Psychology,specialized: psychology,Baltimore,United States,United                    │ │ States,39.289444,-76.615278,English                                                                              │ │ 295,Die Krankenpflege,specialized: therapy,Berlin,German Empire,Germany,52.52,13.405,German                      │ │ 296,Die deutsche Klinik am Eingange des zwanzigsten Jahrhunderts,general medicine,Berlin,German                  │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 300,Therapeutische Monatshefte,specialized: therapy,Berlin,German Empire,Germany,52.52,13.405,German             │ │ 1,Allgemeine Zeitschrift für Psychiatrie,specialized: psychiatry and neurology,Berlin,German                     │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 7,Archiv für Psychiatrie und Nervenkrankheiten,specialized: psychiatry and neurology,Berlin,German               │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 10,Berliner klinische Wochenschrift,general medicine,Berlin,German Empire,Germany,52.52,13.405,German            │ │ 13,Charité Annalen,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                             │ │ 21,Monatsschrift für Psychiatrie und Neurologie,specialized: psychiatry and neurology,Berlin,German              │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 29,Virchows Archiv,"specialized: anatomy, physiology and pathology",Berlin,German                                │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 31,Zeitschrift für pädagogische Psychologie und Pathologie,specialized: psychology and pedagogy,Berlin,German    │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 42,Vierteljahrsschrift für gerichtliche Medizin und öffentliches Sanitätswesen,"specialized: anthropology,       │ │ criminology and forensics",Berlin,German Empire,Germany,52.52,13.405,German                                      │ │ 47,Centralblatt für Nervenheilkunde und Psychiatrie,specialized: psychiatry and neurology,Berlin,German          │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 50,Russische medicinische Rundschau,general medicine,Berlin,German Empire,Germany,52.52,13.405,German            │ │ 76,Deutsche Aerzte-Zeitung,general medicine,Berlin,German Empire,Germany,52.52,13.405,German                     │ │ 87,Monatsschrift für Geburtshülfe und Gynäkologie,specialized: gynecology,Berlin,German                          │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 108,Archiv für klinische Chirurgie,specialized: surgery,Berlin,German Empire,Germany,52.52,13.405,German         │ │ 113,Zeitschrift für klinische Medicin,general medicine,Berlin,German Empire,Germany,52.52,13.405,German          │ │ 159,Deutsche militärärztliche Zeitschrift,specialized: military medicine,Berlin,German                           │ │ Empire,Germany,52.52,13.405,German                                                                               │ │ 162,Jahresbericht über die Leistungen und Fortschritte auf dem Gebiete der Neurologie und                        │ │ Psychiatrie,specialized: psychiatry and neurology,Berlin,German Empire,Germany,52.52,13.405,German               │ │ 192,Ärztliche Sachverständigen-Zeitung,general medicine,Berlin,German Empire,Germany,52.52,13.405,German         │ │ 198,Zeitschrift für die Behandlung Schwachsinniger und Epileptischer,specialized: psychiatry and                 │ │ neurology,Berlin,German Empire,Germany,52.52,13.405,German                                                       │ │ 258,Der Pfarrbote,news media,Berlin,German Empire,Germany,52.52,13.405,German                                    │ │ 71,Correspondenz-Blatt für Schweizer Aerzte,general                                                              │ │ medicine,Bern,Switzerland,Switzerland,46.948056,7.4475,German                                                    │ │ 6,Archiv für mikroskopische Anatomie,"specialized: anatomy, physiology and pathology",Bonn,German                │ │ Empire,Germany,50.733333,7.1,German                                                                              │ │ 203,The Journal of Abnormal Psychology,specialized: psychology,Boston,United States,United                       │ │ States,42.358056,-71.063611,English                                                                              │ │ 273,"Correspondenz-Blatt der Deutschen Gesellschaft für Anthropologie, Ethnologie und                            │ │ Urgeschichte","specialized: anthropology, criminology and forensics",Braunschweig,German                         │ │ Empire,Germany,52.266667,10.516667,German                                                                        │ │ 303,Policlinique de Bruxelles,general medicine,Brussels,Belgium,Belgium,50.85,4.35,French                        │ │ 306,Annales de la Société Belge de Neurologie,specialized: psychiatry and                                        │ │ neurology,Brussels,Belgium,Belgium,50.85,4.35,French                                                             │ │ 19,Journal de neurologie,specialized: psychiatry and neurology,Brussels,Belgium,Belgium,50.85,4.35,French        │ │ 25,"Revue internationale d'électrothérapie, de physiologie, de médecine, de chirurgie, d'obstétrique, de         │ │ thérapeutique, de chimie et de pharmacie",general medicine,Brussels,Belgium,Belgium,50.85,4.35,French            │ │ 35,Bulletin de la Société de Médecine Mentale de Belgique,specialized: psychiatry and                            │ │ neurology,Brussels,Belgium,Belgium,50.85,4.35,French                                                             │ │ ...                                                                                                              │ ││ │ ...                                                                                                              │ │                                                                                                                                          │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Great! We've successfully downloaded the file, and we can see there's lots of information here.

At the moment, we're most interested in the file output. This contains the actual contents of the file that we have just downloaded.

Let's separate this out and store it in a separate variable for us to use.

downloaded_file = outputs['file']

New Formats: Creating and Converting

What next? We could transform the downloaded file contents into a different format.
Let's use the operation list earlier, and look for something that allows us to create something out of our new file.

kiara.list_operation_ids('create')

['create.betweenness_rank_list', 'create.closeness_rank_list', 'create.cut_point_list', 'create.database.from.file', 'create.database.from.file_bundle', 'create.database.from.table', 'create.degree_rank_list', 'create.eigenvector_rank_list', 'create.network_data.from.file', 'create.table.from.file', 'create.table.from.file_bundle']

Our file was orginally in a CSV format, so let's make a table using create.table.from.file.

Just like when we used download.file, we can double check what this does, and what inputs and outputs this involves.

This time, we're also going to use a variable to store the operation in - this is especially handy if the operation has a long name, or if you want to use the same operation more than once without retyping it.

op_id = 'create.table.from.file'

kiara.retrieve_operation_info(op_id)

Author(s)           Markus Binsteiner   markus@frkl.io Context           Tags         tabular                                                                                                   
  Labels     package: kiara_plugin.tabular                                                                             
  References source_repo: https://github.com/DHARPA-Project/kiara_plugin.tabulardocumentation: https://DHARPA-Project.github.io/kiara_plugin.tabular/Operation details Documentation   Create a table from a file, trying to auto-determine the format of said file.                          
  Inputs         field name           type     description                         Required  Default             ──────────────────────────────────────────────────────────────────────────────────────────────────    
  file                  file      The source value (of type 'file').   yes        -- no default --     
  first_row_is_header   boolean   Whether the first row of the file    no         -- no default --     
             is a header row. If not provided,                                    
             kiara will try to auto-determine.                                    
  Outputs        field name  type   description                                                                    ──────────────────────────────────────────────────────────────────────────────────────────────────    
  table        table   The result value (of type 'table').

Great, we have all the information we need now.

Let's go again.

First we define our inputs, that is the downloaded file we saved earlier as well as telling kiara that the first row should be read as a header.

Then use kiara.run_job with our chosen operation, this time stored as op_id.

Once this is saved as our outputs, we can print it out.

inputs = {
    "file": downloaded_file,
    "first_row_is_header": True
}

outputs = kiara.run_job(op_id, inputs=inputs)

outputs

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │                                                                                                                                          │ │ field  value                                                                                                                        │ │  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │ table │ │ Id   Label           JournalType     City       CountryNetworkT  PresentDayCoun  Latitude   Longitude   Language │ │  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │   75    Psychiatrische   specialized: p   Amsterdam   Netherlands       Netherlands      52.366667   4.9          Dutch      │ │   36    The American J   specialized: p   Baltimore   United States     United States    39.289444   -76.615278   English    │ │   208   The American J   specialized: p   Baltimore   United States     United States    39.289444   -76.615278   English    │ │   295   Die Krankenpfl   specialized: t   Berlin      German Empire     Germany          52.52       13.405       German     │ │   296   Die deutsche K   general medici   Berlin      German Empire     Germany          52.52       13.405       German     │ │   300   Therapeutische   specialized: t   Berlin      German Empire     Germany          52.52       13.405       German     │ │   1     Allgemeine Zei   specialized: p   Berlin      German Empire     Germany          52.52       13.405       German     │ │   7     Archiv für Psy   specialized: p   Berlin      German Empire     Germany          52.52       13.405       German     │ │   10    Berliner klini   general medici   Berlin      German Empire     Germany          52.52       13.405       German     │ │   13    Charité Annale   general medici   Berlin      German Empire     Germany          52.52       13.405       German     │ │   21    Monatsschrift    specialized: p   Berlin      German Empire     Germany          52.52       13.405       German     │ │   29    Virchows Archi   specialized: a   Berlin      German Empire     Germany          52.52       13.405       German     │ │   31    Zeitschrift fü   specialized: p   Berlin      German Empire     Germany          52.52       13.405       German     │ │   42    Vierteljahrssc   specialized: a   Berlin      German Empire     Germany          52.52       13.405       German     │ │   47    Centralblatt f   specialized: p   Berlin      German Empire     Germany          52.52       13.405       German     │ │   50    Russische medi   general medici   Berlin      German Empire     Germany          52.52       13.405       German     │ │   ...   ...              ...              ...         ...               ...              ...         ...          ...        │ │   ...   ...              ...              ...         ...               ...              ...         ...          ...        │ │   277   L'arte medica    general medici   Turin       Italy             Italy            45.079167   7.676111     Italian    │ │   288   Allgemeine öst   specialized: a   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   18    Jahrbücher für   specialized: p   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   30    Wiener klinisc   general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   44    Wiener klinisc   general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   45    Wiener medizin   general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   72    Wiener medizin   general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   81    Monatsschrift    general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   93    Klinisch-thera   general medici   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   151   Medicinisch-ch   specialized: s   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   199   Der Militärazt   specialized: m   Vienna      Austro-Hungaria   Austria          48.2        16.366667    German     │ │   261   Медицинская бе   general medici   Voronezh    Russian Empire    Russia           51.671667   39.210556    Russian    │ │   77    Medycyna         general medici   Warsaw      Russian Empire    Poland           52.233333   21.016667    Polish     │ │   150   Kronika Lekars   general medici   Warsaw      Russian Empire    Poland           52.233333   21.016667    Polish     │ │   86    Grenzfragen de   specialized: p   Wiesbaden   German Empire     Germany          50.0825     8.24         German     │ │   206   Ergebnisse der   specialized: a   Wiesbaden   German Empire     Germany          50.0825     8.24         German     │ ││ │                                                                                                                                          │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

This has done exactly what we wanted, and shown the contents from the downloaded file as a table. But we are also interested in some general (mostly internal) information and metadata, this time for the new table we have just created, rather than the original file itself.

Let's have a look.

outputs_table = outputs['table']

outputs_table

 value_id            36df833f-0dbe-4683-b912-42c73df877ac                                                                                  
  kiara_id            441206f8-e5b4-43d1-b198-d4741dc64e04                                                                                  
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── data_type_info     data_type_name     table                                                                                            
  data_type_config   {}                                                                                               
  characteristics    {                                                                                                
     "is_scalar": false,                                                                            
     "is_json_serializable": false                                                                  
   }                                                                                                
  data_type_class   python_class_name    TableType                                                                 
  python_module_name   kiara_plugin.tabular.data_types.table                                     
  full_name            kiara_plugin.tabular.data_types.table.TableType                           
  destiny_backlinks   {}                                                                                                                    
  enviroments         None                                                                                                                  
  property_links      {                                                                                                                     
     "metadata.python_class": "2e52a3a1-de3d-4202-aea1-90a0cec145e0",                                                    
     "metadata.table": "29bc8c46-be1b-4a3a-8f52-a58c1f194cc2"                                                            
   }                                                                                                                     
  value_hash          zdpuAn89Et1ENzfoASJRYcWEceyfRiPg664mN4nnHLFnjRLyg                                                                     
  value_schema       type          table                                                                                                 
  type_config   {}                                                                                                    
  default     not_set optional      False                                                                                                 
  is_constant   False                                                                                                 
  doc           The result value (of type 'table').                                                                   
  value_size          42.79 KB                                                                                                              
  value_status      -- set --

Querying our Data

So now we have downloaded our file and converted it into a table, we want to actually explore it.

To do this, we can query the table using SQL and some functions already included in kiara.

Let's take another look at that operation list, this time looking for functions that let us 'query'.

kiara.list_operation_ids('query')

['query.database', 'query.table']

Well, we already know our file has been converted into a table, so let's have a look at query.table.

kiara.retrieve_operation_info('query.table')

Author(s)           Markus Binsteiner   markus@frkl.io Context           Tags         tabular                                                                                                   
  Labels     package: kiara_plugin.tabular                                                                             
  References source_repo: https://github.com/DHARPA-Project/kiara_plugin.tabulardocumentation: https://DHARPA-Project.github.io/kiara_plugin.tabular/Operation details Documentation   Execute a sql query against an (Arrow) table.                                                          
   The default relation name for the sql query is 'data', but can be modified by the 'relation_name'      
   config option/input.                                                                                   
   If the 'query' module config option is not set, users can provide their own query, otherwise the       
   pre-set                                                                                                
   one will be used.                                                                                      
  Inputs         field name     type    description                                Required  Default             ──────────────────────────────────────────────────────────────────────────────────────────────────    
  table           table    The table to query                          yes        -- no default --     
  query           string   The query, use the value of the             yes        -- no default --     
            'relation_name' input as table, e.g.                                        
            'select * from data'.                                                       
  relation_name   string   The name the table is referred to in the    no         data                 
            sql query.                                                                  
  Outputs        field name    type   description                                                                  ──────────────────────────────────────────────────────────────────────────────────────────────────    
  query_result   table   The query result.

So from this information, we only need to provide the table itself, and our query.

Let's work out how many of these journals were published in Berlin.

inputs = {
    "table": outputs_table,
    "query": "SELECT * from data where City like 'Berlin'"
}

outputs = kiara.run_job('query.table', inputs=inputs)

outputs

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │                                                                                                                                          │ │ field         value                                                                                                                 │ │  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │ query_result │ │ Id   Label           JournalType     City    CountryNetwor  PresentDayCoun  Latitude  Longitude  Language │ │  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │   295   Die Krankenpfl   specialized: t   Berlin   German Empire   Germany          52.52      13.405      German     │ │   296   Die deutsche K   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   300   Therapeutische   specialized: t   Berlin   German Empire   Germany          52.52      13.405      German     │ │   1     Allgemeine Zei   specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   7     Archiv für Psy   specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   10    Berliner klini   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   13    Charité Annale   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   21    Monatsschrift    specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   29    Virchows Archi   specialized: a   Berlin   German Empire   Germany          52.52      13.405      German     │ │   31    Zeitschrift fü   specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   42    Vierteljahrssc   specialized: a   Berlin   German Empire   Germany          52.52      13.405      German     │ │   47    Centralblatt f   specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   50    Russische medi   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   76    Deutsche Aerzt   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   87    Monatsschrift    specialized: g   Berlin   German Empire   Germany          52.52      13.405      German     │ │   108   Archiv für kli   specialized: s   Berlin   German Empire   Germany          52.52      13.405      German     │ │   113   Zeitschrift fü   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   159   Deutsche milit   specialized: m   Berlin   German Empire   Germany          52.52      13.405      German     │ │   162   Jahresbericht    specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   192   Ärztliche Sach   general medici   Berlin   German Empire   Germany          52.52      13.405      German     │ │   198   Zeitschrift fü   specialized: p   Berlin   German Empire   Germany          52.52      13.405      German     │ │   258   Der Pfarrbote    news media       Berlin   German Empire   Germany          52.52      13.405      German     │ ││ │                                                                                                                                          │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The function has returned the table with just the results we were looking for from the SQL query.

Let's narrow this further, and find all the journals that are just about general medicine and published in Berlin.

We can re-use the query.table function and the table we've just made, stored in outputs['query_result']

inputs = {
    "table" : outputs['query_result'],
    "query" : "SELECT * from data where JournalType like 'general medicine'"
}

outputs = kiara.run_job('query.table', inputs=inputs)

outputs

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │                                                                                                                                          │ │ field         value                                                                                                                 │ │  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │ query_result │ │ Id   Label           JournalType     City    CountryNetwork  PresentDayCou  Latitude  Longitude  Language │ │  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────  │ │   296   Die deutsche K   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   10    Berliner klini   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   13    Charité Annale   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   50    Russische medi   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   76    Deutsche Aerzt   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   113   Zeitschrift fü   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ │   192   Ärztliche Sach   general medici   Berlin   German Empire    Germany         52.52      13.405      German     │ ││ │                                                                                                                                          │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Recording and Tracing our Data

We've quite a few changes to this table, so let's double check the information about this new table we've created with our queries.

query_output = outputs['query_result']

query_output

 value_id            008e4800-677d-4f10-a94f-a47a5822b1a0                                                                                  
  kiara_id            441206f8-e5b4-43d1-b198-d4741dc64e04                                                                                  
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── data_type_info     data_type_name     table                                                                                            
  data_type_config   {}                                                                                               
  characteristics    {                                                                                                
     "is_scalar": false,                                                                            
     "is_json_serializable": false                                                                  
   }                                                                                                
  data_type_class   python_class_name    TableType                                                                 
  python_module_name   kiara_plugin.tabular.data_types.table                                     
  full_name            kiara_plugin.tabular.data_types.table.TableType                           
  destiny_backlinks   {}                                                                                                                    
  enviroments         None                                                                                                                  
  property_links      {                                                                                                                     
     "metadata.python_class": "69a56eb4-98f7-4d38-bad1-51b7dc6bc300",                                                    
     "metadata.table": "2a73f673-0cff-45f2-bd6e-393343d6edc0"                                                            
   }                                                                                                                     
  value_hash          zdpuB2LfZYHdiuR1sxy2ZkjPZ7JDnhysN48Y4RN9WNT4AvNN6                                                                     
  value_schema       type          table                                                                                                 
  type_config   {}                                                                                                    
  default     not_set optional      False                                                                                                 
  is_constant   False                                                                                                 
  doc           The query result.                                                                                     
  value_size          5.22 KB                                                                                                               
  value_status      -- set --

Looks good!

We might have changed things around, but we can still get lots of information about all our data.

More importantly, kiara is able to trace all of these changes, tracking the inputs and outputs and giving them all different identifiers, so you know exactly what has happened to your data.
First lets have a look at our basic lineage function - this gets us the 'backstage' of what has been going on, showing the inputs for each of the functions that we have run, and where they might feed into one another. In each case, kiara has assigned the inputs a unique identifier. Check it out!

query_output.lineage

query.table
├── input: query (string)= 0a66077d-b9c7-4a0a-ba81-f60a52055d50 ├── input:relation_name (string)= 593fc9c4-3dfe-4e5b-a017-daf01c05b9ba └── input:table (table)= e324894f-c4ec-4caa-9e27-6f1463437ed3 └──query.table
        ├── input: query (string)= 851bd6c3-c3dd-4506-b316-81797078a515 ├── input:relation_name (string)= 32ff3626-2b42-4cfb-be67-3e8ec0f25446 └── input:table (table)= 36df833f-0dbe-4683-b912-42c73df877ac └──create.table
                ├── input: file (file)= 737547e8-7c61-43e7-a6ee-e037c5304f96 │   └──download.file
                │       ├── input: file_name (string)= 2751a86e-460e-4df8-92c3-a20f65576e3b │       └── input:url (string)= 371573ac-17a3-43ff-9073-19f249e7739e └── input:first_row_is_header (boolean) = 53185a6d-3b24-4744-b501-1f94a5639ab6

We can also visualise this, allowing us to view the different functions and their inputs and outputs as a series of steps or 'workflow' as we've been talking about.

from kiara_plugin.dh_tagung_2023.utils import augment_lineage_data

augmented_nodes = augment_lineage_data(query_output,kiara)

from observable_jupyter import embed

embed('@dharpa-project/kiara-data-lineage', cells=['displayViz', 'style'], inputs={'dataset':augmented_nodes})

Even though we are only actually asking for the data lineage using the last SQL query and the table it made, kiara shows us everything that has happened since we first downloaded the file. This helps us keep an eye on the research process and the changes we are making to the data at the same time!

What next...?

That's great, you've completed the first notebook and successfully installed kiara, downloaded files, tested out some functions, and are able to see what this does to your data.

Now you can check out the other plugin packages to explore how this helps you manage and trace your data while using digital analysis tools!

Getting started - Hello kiara