From: Michael Radich <Michael.Radich@vuw.ac.nz>
Subject: "tacl": a computer tool to aid the analysis of Chinese Buddhist texts
Subject: "tacl": a computer tool to aid the analysis of Chinese Buddhist texts
Dear colleagues,
I write to draw your attention to a piece of software developed by
myself and Jamie Norrish (who has done the programming) for the
analysis of Chinese Buddhist texts.
The tool, which we call "tacl", is free software (also
known as "open source", though we prefer the former term); and
therefore, naturally, anyone can download it or modify it as they wish. See:
The basic functionality of the tool, as described below, has a
range of potential applications, in the study of such questions as:
-- sources of a given text/corpus;
-- later impact of a given text/corpus (citations, borrowings);
-- stylistic features distinctive to a given author, text, corpus,
milieu;
-- implications for dating texts;
-- the investigation and identification of texts of possible
Chinese composition (including "apocrphya");
etc.
At base, the tool is very simple in its conception. It operates on
the xml files produced by CBETA. It analyses texts into "n-grams",
i.e. strings of contiguous characters (here, individual unicode Chinese
characters) of user-defined length. It then allows the user to compare two or
more texts or groups of texts ("A", "B", "C"...)
to find either:
1) all (verbatim) strings SHARED by BOTH A and B (and C etc.);
or
2) all (verbatim) strings UNIQUE to A against B (and C etc.), or
vice versa.
For the purposes of such analysis, the user can define groups of
texts of various sizes, ranging in size from a single text to the entire canon.
It is also possible for users to edit the root library of texts manually, e.g.
to split a single Taisho text into multiple parts.
Results are generated in the form of text (comma-separated
values), which can then be further analysed, sorted or manipulated using such
tools as Microsoft Excel. The tool incorporates a number of further functions
which also allow the user to do such things as:
-- generate counts of particular n-grams in each text;
-- highlight matches in one or more texts inside a base text;
-- filter original raw results in various ways;
-- generate some summary statistics about a set of results;
-- search a corpus for a list of multiple n-grams at one time.
It is also possible to use the results of one round of tests as
input into a next round, and thereby, to concatenate multiple tests. This makes
it possible to examine more complex questions, such as "What terms are
found in Group X, and also in Group Y, but never in Group Z, and
appear in Text A?"
It must be borne in mind that in its present form, the tool only
generates raw material for further human analysis (which, in my experience so
far, can still be laborious and exacting); it is no magic bullet or crystal
ball. Its results must be used with care and critical awareness, including thorough
consideration of one's own operating hypotheses and underlying assumptions. The
tool is also subject to various concrete limitations, such as the fact that it
only finds exact verbatim matches, and the (related) fact that it cannot (at
the current stage of development) handle variant readings as indicated in the
Taisho critical apparatus. These limitations, too, must be carefully considered
in analysing the results of its operation. Nonetheless, despite these
limitations and caveats, I believe that it can already be a powerful aid to the
study of a range of worthwhile problems.
Potential users should be aware that the tool currently operates
from the command line, i.e. it has no gui ("graphic user interface";
point-and-click).
For just one example of work completed with the help of the tool,
please see the following recent publication:
Radich, Michael. "On the Sources, Style and Authorship of
Chapters of the Synoptic Suvarṇaprabhāsottama-sūtra T664 Ascribed to Paramārtha
(Part 1)." Annual Report of The International Research Institute for
Advanced Buddhology 17 (2014): 207-244.
In this article, I argue that four chapters of Baogui's 寶貴 synoptic Suvarṇaprabhāsottama-sūtra 合部金光明經 T664 ascribed to Paramārtha 真諦
(499-569) have a range of previously unobserved sources in earlier Chinese
translation texts, and were probably composed in large part in China. I further
argue for the likelihood that portions of these chapters were composed or
revised in a context closer to the early Sui dynasty 隋 (581-618). In preparing this study, I used tacl to help uncover
extended parallels between Paramārtha's chapters and Chinese source texts, and
to gather stylistic evidence (terminology) more characteristic of Sui authors
than of Paramārtha.
A follow-up to the above publication should appear next year, but
that part of the study deals with Tibetan evidence, in a way that has little to
do with the operation of "tacl". Another tacl-based publication, on a
problem of a different type, should be forthcoming later this year.
We are still working to actively develop the tool in various
directions. However, recent discussions with colleagues have convinced us that
the time is probably right to attract the attention of fellow researchers. We
hope that in so doing, we can make the power of the tool available to others to
aid in new discoveries about our texts; gather suggestions from a wave of early
adopters about possible further improvements; and, ideally, persuade others to
join us in helping develop the tool further, including the underlying code.
Both Jamie Norrish and myself will very happily entertain email
correspondence both off-list or on-list (the latter in my case only,
as Jamie is not a member, and naturally, only if editors deem the query of
general interest to list members), about all matters relating to both the
technical nature and operation of the tool, and its various applications to
Buddhological research problems.
Should there be scholars who are interested in the tool, but put
off by the technicalities of running it from the command line and other
features of its current format, I will also be happy to entertain requests to
run tests for the investigation of particular research problems, within the
constraints of my available work time. (The computer component of the analysis
usually happens quite fast. The biggest constraint on application of the tool
so far has proven to be the time available to the person working with the tool,
and the fact that to date, the only such person has been myself.)
I would like to request that if you do download and try the tool,
or try it out on research problems, you please let us know about what you are
doing, and how you get on. It could be useful to us, in thinking and planning
about future development of the tool, to know what others are doing with it, or
even just how widely other scholars are interested.
Thank you,
Yours,
Michael Radich
Victoria University of Wellington, New Zealand
________________________________________