Whenever I am considering any theory, I am usually also thinking about how it would work in practice, and this has been the case for the last four weeks of DITA. I have had a lot of experience with the design, maintenance and sometimes migration of databases and CRM’s in my professional life, so I have found it easy (and sometimes frustrating) to relate to the theory we’ve been discussing in class – realising that in practice, especially for third sector organisations, it is often very difficult to find the time and the money to ensure intuitive design, easy maintenance, and clean data on an ongoing basis.
When we discussed working backwards in database design, I thought back to the multiple roles I have held where, in the absence of a dedicated Knowledge Manager / Information Specialist / Database Manager (with very little funding and/or strict funding criteria the third sector tends to find it difficult to justify such costs, viewing these roles as ‘non-essential’), I have been parachuted in when the need for a functional database has become urgent – something we have also recently touched on in another module, Information Management & Policy. To know what your questions and needs are before you design a database and start collecting data is absolutely the ideal, but in my experience, not a priority for most small, cash-strapped organisations. So for the last four weeks I have been wondering how a convincing case could be made for low-cost, low time consuming database solutions and more decent investment in dedicated knowledge/database roles for new companies. This would ensure that good practice around data and information doesn’t just begin at launch, but is upheld as well.
Several of the workplaces I have managed databases at have used flat file as their main repository: Microsoft Excel spreadsheets, sometimes complemented by basic Microsoft Access; Access is included in business MS packages, requiring no extra outlay of funds. This is fine for, say, a small charity, start-up or SME with 20 or less employees, as there are several nifty tricks you can use to link spreadsheets, produce reports from them, and manage/filter data within them, and it is easier to track usage with less users. However, the security and edit history functions are not very good, duplication of tasks and sheets is not only possible but usually inevitable, decent coding to avoid these pitfalls and more requires a lot of time that most organisations of these types just simply do not have, and I have never arrived in a role where a Database Usage Guidelines document has even been written and circulated, let alone adhered to by all the employees. I have written several at the end of these roles myself, but again due to the limited time and large workloads that organisations of this type tend to have, I would be surprised if many of them are still in use. Information management and data management do not tend to be priorities until the data needs to be presented, usually in the form of some sort of funding report, so that funding criteria is met and the funding continues.
As we learned in Week 5, Boolean search functions – which can be tricky for staff members who have multiple time consuming tasks to execute per day/hour to get used to – is the most common that I have worked with – consisting of and/or/not. This has been built into most Access databases I’ve worked with, as well as some proprietary software I’ve used – but requires training, which even if very basic, still takes time and money. Vector can be more useful for busy employees, as it is based on nearest match, which therefore also allows for misspelling of words within the data (although lets try and avoid that, as there was nothing I found more frustrating and time-consuming than messy data, of which I dealt with a lot). Examples of Vector are Salesforce and Raisers Edge, both CRM (customer relationship management) software. However, these are usually proprietary, so come at an extra cost. So it is six of one and half a dozen of the other – where you save money with one solution, you lose time (= money). Where you save time with the other solution, you lose money.
In these roles, I had to think about ways in which a database could be made more useful to the teams needing to extract data from it – sometimes involving shadowing roles to watch what their database needs are, or conducting interviews with individual teams/staff members (methods that we have touched on in another module, Research Methods & Communication), then devising ways of enabling all the differing needs to be met simultaneously.
I have always inherited messy data. Tidying it up has often been a hybrid of excel formulas/find and replace, manual data cleaning, and ensuring fields and tags are retained when performing migration to new platforms; quite a task, with usually never enough time to perform it in. In week 7 we learned about other methods of data wrangling (which I wish I’d known about/had the time to research and learn before) – open source Open Refine is one, which would be costly only in terms of a data wranglers time although save time in the long run, and short of writing the code for algorithmic data cleaning myself (I only have very basic coding skills), this seems like a reliable (and geekily exciting) way of cleaning big data sets. I became very used to entering a role to be met with almost immediate clamours of “We need to generate reports for x/y/z, and we need it now – your job is to fix that database for us.” Yet it takes time and a lot of work to clean up a messy database and build the fields and relational schema needed to generate the reports required by all the different needs. If time and money were invested at the beginning of an organisations life, the likelihood of reaching a point further down the line where they suddenly find that all the data they have is absolutely useless in its current form, could be reduced.
A database is only as useful as the quality of the data put into it. If you are working with Boolean search, and over the years people have put Thomas Smith in as Thom Smith, Tom Smith, Thomas Smithe, and/or Thom Smyth, an organisation may bit know that one person could have five or more records that need to be merged/de-duped, and the separate records could hold critical information needed for the ongoing relationship with Mr Smith. Multiply that by 12,000 records, just to cherry pick a random number out of the air, and this is a huge task which requires time and the solitary confinement of the data wrangler.
So, when thinking about the types of organisations that I’ve been used to working for, how do we build databases with little time and little money, on software that is not proprietary, and in a way that allows for easy and clean input, management, processing, exporting and migration of data? One that employees know how to use, and they feel confident that there is actually ‘a point’ in putting in their data accurately, because they know that will generate accurate results and reports when they need them? How do we make the case for investment in roles that look after information as part, or all, of their role – and that this role doesn’t get sidelined or cut in favour of other seemingly more ‘money-making’ roles – leading to the aforementioned inevitable parachuting in of emergency database staff? I also think about this in relation to my interests around the archiving and sharing of grassroots activist group activities, histories and campaign strategies – another type of set-up that almost never has the funds for dedicated database building and upkeep, and therefore the sharing of knowledge. What both third sector organisations and grassroots activist groups have in common is that they want to avoid reinventing the wheel – “how have we done it before, and how can we build on that” – rather than “we have no data and no history, so lets just do what we can do.” This is a question I hope to perhaps be able to answer, or begin to answer, by the end of this degree.
This was DITA blogging exercise no. 2, reflecting on sessions 3, 4, 5 & 6.