Sunday, April 27, 2014

Move Over #BigData, Make Room for #DataSecurity


According to the U.S. Bureau of Labor Statistics, computer and mathematical occupations is expected to have a job growth of 18% over the next 10 years. At the core of many businesses is data, that requires organization, security, mining and analysis. As such, computing disciplines (information technology, computer science and business) include as part of their core curriculum at least one database management systems course. Every spring semester, I teach an advanced database design techniques and physical issues relating to enterprise-wide data management using the “Modern Database Management” by Hoffer et al. as the required course textbook. The course has 3 main foci: 
  • Modeling, e.g., entity-relationship diagram, enhanced entity-relationship diagram
  • Implementation, e.g., logical and physical database design, database querying 
  • Operating logistics e.g., data stewardship, data and database administration
With the onslaught of #BigData, #DataAnalytics, #DataMining and #DataScience mentioned in nearly every computing-related article and conversation, the students (and everyone else) want to know what it is and how it impacts businesses. The Computing Research Association (CRA) Big Data Whitepaper provides a great showcase of the #BigData challenges and opportunities.  The figure below displays the data processing stages (top row) and the interjecting wildcard features (bottom row).
CRA's Big Data Pipeline (http://www.cra.org/ccc/files/docs/init/bigdatawhitepaper.pdf)
#BigData discussions concentrate on the 3Vs (Volume+Variety+Velocity, circa 2001, Gartner Inc.), 4Vs (3Vs+Viability/Veracity, circa 2012) or, now 5Vs (4Vs+Value, circa late 2012/early 2013). So, the heterogeneity, scale, timeliness and human collaboration features from the above figure are covered, but where does that leave privacy? The data conversation must do a better integration of data security/privacy needs and challenges. The emergence of data-centric specializations and degree programs in data analytics, data mining and data science is fueled by the increasing need to train undergraduates and graduate students to be prepared to handle actual businesses data needs. To expose undergraduates to data and database security, an advanced database design course should augment the operating logistics course topic and inject database security overview, granular access control, securing database-to-database communications, and multi-level security in database systems.

Graduate student awareness and training in data and database security/privacy must be tightly coupled education with applied research. Toward this effort, NSF has sponsored the Information Security Research and Education (INSuRE) Collaborative project. INSuRE is to establish a long-lasting Centers of Academic Excellence in Information Assurance Research (CAE-R) and government coalition in cybersecurity research. This initial partnership includes four successful and mature CAE-Rs and the National Security Agency (NSA) in order to design, develop and test the research network. INSuRE will be a self-organizing, cooperative, multi-disciplinary, multi-institutional, and multi-level research collaborative that can work on both unclassified and classified research problems in the information security domain.

Other domains have data and database security/privacy considerations. Here's just a few:
  • Data Security in Transportation: digital cities & smart(@pietromax, @UCLALewisCenter), ride sharing (@uber, @lyft)
  • Data Security in Aviation: Timely Data Acquisition for the Aviation Industry
  • Data Security in Health: #HCLDR twitter discussions Tues @ 8:30PM, #BlerdChat & #HCHLITSS twitter discussions Thurs. @ 7PM & 8PM, respectively

Combating the Academic NO

originally posted on April 10, 2014 on csdoctorsister.blog.com

The language of the academic ‘no’  has its own dialect. In short, there are MANY ways to receive a NO. Let’s see there is “we regret to inform you…”, “the submission is not selected”, “your submission is not appropriate for…”, “your submission is not competitive…”. Do I need to go on? You get the point. Any externally reviewed document is fair game.

Scholarly/Research Work
  • 2-page extended abstract/poster submission
  • 4-6 page work-in-progress paper submission
  • 6-12 page conference paper submission
  • 20+ page journal article submission
Corporate  Partnership/Sponsorship
  • 1-2 page course development/augmentation proposal
  • 2-3 page research project proposal
  • 2-3 page non-research project with deliverables and timeline
U. S. Federal Funding Streams
  • NSF solicited and unsolicited grant proposals
  • NIH grant proposals
  • DoD/DHS/DoJ/DoE Broad Agency Announcement grant proposals
You will undoubtedly receive rejection/declined notifications. So what’s next?
First, take a couple of days to digest the NO. You have to let the NO bounce off. However, if you are unaccustomed to receiving a non-favorable response, then you may need more time and experience of how to cope. It’s easy to take the NO as a personal attack. Any faculty in academia has been there. You have poured your heart and soul into that submitted manuscript. It was a lot of thinking, meeting with colleagues, writing and editing. You worked real hard on the content and presentation. You thought you did a great job, but you received that NO. The goal in digesting the NO is to detangle you as a researcher from the manuscript.

Second, parse through the reviewers’ feedback (objectively). You could easy just dismiss the NO and then, simply ignore the feedback. But then you miss your learning opportunity. Few people fully understand the details of your work. It is your charge to share your expertise, ask questions and present a reasonable solution where others have not.  You are learning how to fill in the technical gaps of  your intended audience through perfecting your presentation style. I consider reviewers’ feedback to come in 4 main categories.
  • Editorial. If checking for proper grammar is not be your forte and you struggle with grammatical issues, get a copy editor.
  • Pitch Problem. The “pitch” is two-fold: the problem and the solution. The act of convincing your technical community, i.e., the 3-4 reviewers, of the problem and/or proposed solution significance can be a daunting task. You must motivate the problem and justify your proposed solution. You may not have sufficiently conveyed the problem scope and thus the proposed solution contributions. Your task is to think through your ideas and arguments as though you are the reviewer. I suggest reading “Made To Stick” by Chip and Dan Heath (http://www.amazon.com/Made-Stick-Ideas-Survive-Others/dp/1400064287).
  • Technical Issues. Your subject-matter expertise may not be at the level required by the reviewers. (Yes, subject-matter expertise is a moving target for each reviewer.) For instance, the literature review does not include certain references or the proposed solution has been previously published. You can avoid technical issues by remaining current in your field. To be a researcher is to always be learning. Remember: a good reputation within your research field mainly entails high quality research product.
  • Stylistic differences. Some reviewers will like the way you write. Others will not. I don’t tend to focus on stylistic differences. I consider them to be “low blows”, e.g.,  a reviewer just doesn’t like the manuscript and can not justify it into one of the other above mentioned categories. Chalk it up to human-error. Keep in mind that your goal is to make the majority of reviewers satisfied with your research product.
Third, MAKE and then EXECUTE the revision plan. Most people can seamlessly make the revision plan. The execution of said-plan — not so much. Timelines are helpful to a point. Accountability partners, such as project colleagues, mentors or advisors, are much more effective. You can deem excuses rational if you are only accountable for yourself. When you have to say those excuses out loud, then you realize that they are part of your procrastination infrastructure. You have to see your path to a YES notification, then the NO notification becomes a distant memory.

So what happens if you don’t receive a notification at all? Well, that’s another post for another time…

Grant Submission: A Funny Story

originally posted on April 10, 2014 on csdoctorsister.blog.com

One of the continuous activities of a faculty member, aside from teaching, research and service, is that of writing grant proposals. The grant proposers hope the project will be funded and it will lead to other funded grants. It’s a deep-thinking, intellectually stimulating, yet time-consuming process. I’ll save the grant writing process for another post. Anywho, here’s the story:

The set-up: It’s March 2013.
The Transportation Research Board (TRB) posts the grant solicitation: Freight Transportation Data Architecture: Data Element Dictionary.
  • Grant Max Funding Amount: $500,000
  • Grant Duration: 18 months
  • Estimated Grant Start Date: 7/31/2013
  • Proposal Due: April 19, 2013 at 4:30PM, 20 single-bound copies delivered to the TRB office
The grant project has a deliverable of a “searchable and sustainable web-based freight transportation analysis to be hosted at the U.S. Department of Transportation”. The TRB’s solicitation format include the project tasks and phases.

The story details: Fast forward a few weeks. I’m asked to scope the data management and data search tasks with collaborators: Transportation SMEs (subject-matter experts) Bruce C Hartman and Christopher Clott and Supply Chain SMEs Edie Schmidt and Regena Scott. I winded up serving as the project’s Principal Investigator.

A few weeks passed and over 10K words later, we had a complete grant proposal about improving the data model for the U.S. government transportation industry.

It’s April 18, 2013 at 12:00PM.

The Purdue pre-award sponsored programs office created the 20 bound copies.  A complete grant proposal has a number of elements, including project plan, references, research biographies and letters of support, and makes it quite lengthy. So 20 bound copies were divided and placed in two boxes. These boxes were sent from the Purdue pre-award sponsored office to the TRB.

The punch-line: The boxes left the office together, arrived at the Washington D.C. shipping facility at the same time, yet were separated, to then be delivered to the same final destination on the same day. One box arrived at TRB on 4/19/2013 at 10:30AM. The other box arrived at TRB on 4/19/2013 at 4:43PM. The second box was 13 minutes late. The grant proposal was not even reviewed. The shipping and delivery company failed my colleagues and me. So hilarious that it is not. I mean, why did the TRB even want paper copies anyways? Digital versions would have been so much cheaper!

Here is an excerpt from our introduction. (Hopefully, at least someone will read our work.)

Effective decision making in the freight transportation industry is severely limited by the disparate definitions used by the wide range of data sources – federal, state, regional sources as well as private and public. The objective of this research is to remedy this problem by producing a searchable and sustainable web-based freight data element dictionary for transportation analysis. A standardized approach that is used throughout the nation will significantly improve the transferability of freight information at any scale – ranging from a single cargo box to an entire train bed.

Our multidisciplinary team of researchers from Purdue University and the University of St. Francis (USF) is proposing to develop a searchable, web-based data dictionary of elements suitable for the National Freight Database Architecture.  This feature will allow users to find data elements relevant to taxonomy elements and see details about them. Other search criteria will also be developed and use cases will also be presented. An additional outcome of our work will be a comprehensive report detailing the data sources studied and recommended, the taxonomy defined, the entities documented, and the hierarchical layout of the data elements, and functional relationships between data from different sources. Finally, our research team will produce a technical paper and conference presentation, with the goal of explaining the scope of the project and providing an overview of the data dictionary.

Tenured BWiC

originally posted on April 6, 2014 on csdoctorsister.blog.com

Motivational and inspiring statements by unknown
The tenure-track Assistant Professor position is a finite probationary term, which typically spans from 4-6 years for most colleges. These Assistant Professors take this time to learn academia, grow research (if college is research-focused), teach courses and serve her/his technical community through professional societies. At the end of the probationary term, the Assistant Professor generates a document describing the contributions and impact she or he has made to the institution in the categories of scholarship, teaching and service. The document is evaluated and voted upon by her/his colleagues in multiple rounds – proceeding to subsequent round is typically contingent on receiving at least a favorable in the previous round. This evaluation process takes about an academic year. A positive outcome of promotion and tenure expands a faculty’s opportunities and more academic freedom. My tenure-track position began in August 2008. My evaluation process has began August 2013.

On 4/4/2014, the Purdue Board of Trustees approved my faculty promotion to Associate Professor with tenure in Computer and Information Technology, effective August 18, 2014 (http://www.purdue.edu/newsroom/releases/2014/Q2/purdue-trustees-approve-faculty-promotions.html). #ThisIsEpic #TenuredBWiC

With this announcement, history has been made. I became the first Black woman to earn tenure in Purdue’s College of Technology. This history-making act goes far beyond Purdue. According to the Institution for Women’s Policy Research “Accelerating Change for Women Faculty of Color in STEM: Policy, Action, and Collaboration” report, only 6% of STEM faculty are women of color. That’s 6,400 of 111,800. The challenges facing STEM women of color faculty are enumerated in the report; however, I have experienced each one to a certain degree throughout my academic career. I wish there was less enumeration of the challenges and more advocating of other STEM WOC in academia, industry and government. What actions are you taking to champion a STEM WOC?

So what’s next for this tenured BWiC?

First, celebrate. Honestly, this may last a while.

Second, decide on which fork to take in this road: step up, lean in, lean back, or opt-out

Timely Data Acquisition for the Aviation Industry

originally posted on March 30, 2014 on csdoctorsister.blog.com

In the wake of #MH370 (classified as missing since March 8th), I posted to FB on March 13th: How do you LOSE A PLANE with all these technological mediums?!? Where the freak’n frack is Malaysian Airlines Flight 370?!? #hearthurts #disappointed

The fact that a plane carrying 227 passengers and 12 crew goes missing is unimaginable and unacceptable. As a #datahead, I immediately think about what data was available and not captured that could be useful in finding this plane. Unfortunately, #MH370 is a recent example of aviation’s need for better data understanding.

Here’s another recent example: in April 2011, an EF4 tornado ripped through St. Louis, right passed the Lambert-St. Louis International Airport, with at least one airplane reported as  landing due to the Air Traffic Control (ATC) being unaware that the tornado was on the ground. In light of the April 2011 EF4 tornado, my colleague, Mary Johnson, and I are working on ways to assist the aviation industry in handling their data.

Our efforts, approach and prior works are given below.

Unknown to most commercial airline passengers, extensive information is reviewed by the pilot and the dispatcher who must both sign off and agree that the flight is ready for take-off. Both the pilot and the dispatcher must be FAA certificate holders to perform their jobs. Information is aggregated by ATC from sources such as the National Weather Service (NWS), flight tracking websites, airport changes, official notices, pilots, FAA or similar agencies, in an effort to inform air transport personnel about flight scheduling viability. These data may not arrive in a timely manner to make decisions regarding flight dispatch which has a role at take-off, in-flight, and landing to ensure safety and operational control. Twitter may provide a mechanism to improve the timeliness of accessing accurate data, and data not normally available to ATC, dispatch services and/or pilots. Since twitter is publicly accessible and contains user-generated content, the vulnerability, reliability and trustworthiness of its data must be assessed before introducing this information stream into the aviation industry that requires accurate and timely information. This project investigates the use of twitter to improve the data timeliness and possibly increase the data coverage to ATC for air transport personnel, pilots, aircraft dispatchers, airline managers, and airport managers.  As part of the investigations, this project seeks to assess twitter’s vulnerability, reliability and trustworthiness for use in Federal Aviation Administration controlled airspace to augment the data currently available and used by aircraft dispatchers.

We have begun our inquiry by implementing a twitter-based prototype (Marshall, Johnson et al., 2012) that considers 4 major commercial airline carriers and 30 US airports during Hurricane Sandy in October 2012 that captured up-to-the-minute aviation conditions via airport ICAO and IATA codes and keyword analysis (Marshall, Johnson et al., 2013). We implemented K-means clustering and computed the 2 mutual information evaluations. Of the collected tweets’ containing airport codes during Hurricane Sandy, we observe in the largest cluster of the airport codes are not directly located in Hurricane Sandy’s path, indicating that intermediate airports that were affected due to lack of aircrafts and air transport system personnel. The second experiment dealt with keywords. Sample keywords and phrases include jet blue, delta, united, southwest, flight, cancel, flood, storm, weather, sandy, nor’easter, new york, new jersey, philadelphia, washington, west virginia, maryland, food shortage and electricity. Of the collected tweets’ containing these types of keywords, we noticed 2 relatively large groupings including the airline keywords. As expected, the keywords related to Delta Airlines and aviation/flight appear in nearly all the clusters. Surprisingly, the keywords related to Hurricane Sandy only appeared in one cluster implying the tweets centered on the consequences of the storm — not the progression or status of the storm itself.

The current prototype system is written on Chrome’s JavaScript runtime node.js platform for easily building fast, scalable network applications. Between the end of October 2012 to the beginning of November 2012, about 13,000 tweets were collected using the twitter REST API. Our database contains three types of clusters namely Airport, Airlines and, Path.Airport. For our selected airlines, they generally provide flight schedules in one month increments in the form of downloadable PDF and require data format conversion to JSON.

References
  1. Marshall, B., Johnson, M., Magikar A., Ghanekar A., Mathew I., Delaveau L., Budhiraju R., Chapparala R. (2012). Flight Data Analyzer using Twitter.  Journal of Emerging Trends in Computing and Information Sciences, 3(8):1226-1234.
  2. Marshall, B., Johnson, M. and Chunduru, N. (in press). Towards general aviation using Twitter as a virtual aircraft dispatcher. Proceedings of the Conference on Telecommunications and Information Technology. Murray, Kentucky: Information and Telecommunications Education and Research Association.

5 Tips for the Black Woman in Computing (BWiC)

originally posted on March 23, 2014 on csdoctorsister.blog.com

Recently, I read @EvetteDionne: 5 Tips for Surviving Grad School As A WOC. I gave #icouldwriteSTEMWOCtips as a reply. For a STEM WOC, be an undergraduate student, graduate student or professional, the mode of operation is slightly different. So here goes:

Continually refresh your core competency skills. Most likely, your computing sub-discipline is rapidly evolving. To keep current with the newest, latest-and-greatest system, software and/or tool is a tall order. As you build your career, hone your technical skills regularly through reading, studying, and becoming an active contributor to your field. You want to have the pulse of your field – typically given by your field’s ‘thought’ leaders and influential members. I suggest identifying the influential members in your technical field and concentrate on keeping up-to-date on their research. Your intelligence, knowledge and experience within your discipline is completely under your control. Know your stuff and whenever you are challenged (and you will be because everyone is), you will simply further impress potential colleagues, employers, and clients. Plain talk: what’s in your technical toolbox?

Establish and grow your support networks. Yes, that is networks – plural. One network is not enough. I have found myself simultaneously building 4 support networks:
    • The racial/ethnic circle: under-represented minorities (URM), especially in STEM, conversations
    • The gender circle: female empowerment conversations
    • The technical circle: talking shop around that virtual water cooler
    • The philosophy of life circle: bettering your perspective of life and your role in it
I mean, peruse through my twitter account. My twitter followers and those who I follow are considered small by twittersphere standards but I find it to be a wonderfully, eccentric band of insightful and inspiring commentary. To have multiple networks is not novel, but for Black women, there is a tendency to fall into the stereotypical ‘superwoman’ role, e.g., the one-woman show. No one else does it. Why should you?!? A blend of face-to-face and digital interactions is very beneficial. Plain talk: are the individuals you are surrounding around yourself  progressive or oppressive people?

Execute your plan, not one others have for you. Others have suggested activities and events that may not best serve their short-term or long-term goals but not me. Some of these engagements I have participated, while others I have not. Think deeply and think through the consequences of the choices. Confer with others, whose opinion you value, to assist you in weighing your options before you make your final decision. Hence, the motivation the establish and grow your support networks.

This one is the most difficult and time-consuming advice since it requires you to know your plan. Start by devising your career mission and vision statement. Undoubtedly, you need some first-hand successes and missteps to know what you do and don’t want to incorporate into your plan. Nevertheless, you should aim to be clear and comfortable with your plan and the common sense to revise said-plan when necessary. Plain talk: What’s your hustle?

Identify and use your beard. The likelihood of finding and then building a  relationship with someone who looks like you at your place of employment is slim. Those who are considered successful could probably be counted on 2 hands. For many BWiCs, the feeling of isolation and being invisibly visible in your discipline and eventual place of employment is common. The computing community is not accustomed to interacting with Blacks, women and certainly not BWiCs. As a result, your voice is marginalized in most technical interactions. To combat this marginalization, you may need to speak through at least one beard.

Your beard is most likely a member of  the computing majority – a man or woman of Caucasian, Indian, Asian or South Asian descent, who advocates your ideas, not stealing them, and also actively supports your career growth. Be the best you. Revel in your uniqueness. Let them help you. Plain Talk: find the work-arounds.

Confidence is queen. My definition of queen is a technical woman who has healthy self-esteem, comfortable with her career plans, active and productive member of her technical community and working toward building her networks. Not arrogant. Not a know-it-all. Not her-way-or-the-highway. She is well-balanced since she is capable of reporting well up the chain of command (senior colleagues, management, etc) as well as reporting well down the chain of command (junior colleagues, support staff, etc). She is unwavering in her plan outcomes but flexible in the path/route to arrive to that final goals. This is a life-long evolution of you. Lean in, lean back, stand up and/or walk out, when you deem appropriate. Admittedly, this tip is very Zen. Heck, I don’t feel like I’m a queen, yet. But if you have knowledge of this vision, your current circumstances can be placed in the proper perspective. Your value is not attached to your situation. Emote your emerging queen status. Plain talk: lessen the impostor syndrome and strive for queen status.

A Little Chit, A Little Chat

originally posted on March 11, 2014 on csdoctorsister.blog.com

“Setting an example is not the main means of influencing others; it is the only means.”
~ Albert Einstein