데이터 (사이언스) 공학: Summary of Basic Commands

Key Points

데이터 사이언스 언어
∽̱ 9월6일
  • 데이터 사이언스와 언어를 이해한다.

  • 데이터 사이언스 구성 요소와 언어간 차이점을 이해한다.

[언어] R
∽̱ 9월20일
  • 데이터 사이언스로 R의 장단점을 이해한다.

  • 다양한 R구문이 출현한 배경과 활용방안을 파악한다.

  • R을 구성하는 자료구조(데이터프레임, 리스트 칼럼)와 함수형 프로그래밍을 이해한다.

[언어] 파이썬
∽̱ 9월27일
  • 데이터 사이언스로 파이썬의 장단점을 이해한다.

  • 다양한 파이썬 기능 중 데이터 사이언스 관련 내용만 추출하여 구현할 수 있다.

  • 동일한 문제를 R과 파이썬으로 구현할 수 있고 상황에 맞춰 최적 선택을 할 수 있다.

[언어] API 프로그래밍
∽̱ 10월4일
  • API, 함수, 메쏘드, 팩키지, 모듈을 구분한다.

  • API 기본기를 습득한다.

  • API를 활용하여 원하는 바를 추구한다.

데이터 사이언스 자료구조
∽̱ 10월11일
  • 핵심 데이터 사이언스 자료구조를 파악한다.

[자료구조] 시계열 데이터
∽̱ 10월18일
  • 시계열 데이터를 접하는 사례를 숙지한다.

  • 직접 시계열 데이터를 가져와서 전처리하고 이를 모형화한다.

  • 시계열 예측 구축을 자동화하고 이를 배포하는 것도 DevOps로 자동화한다.

중간고사
∽̱ 10월25일
[자료구조] 지리 공간 데이터
∽̱ 11월01일
  • 공간 데이터 사이언스 기본을 이해한다.

  • 직접 데이터프레임을 다양한 지도위에 시각화한다.

  • 공간 데이터를 데이터프레임을 통해 교차분석한다.

[자료구조] NLP - 텍스트 마이닝(R)
∽̱ 11월08일
  • 데이터 사이언스 맥락에서 텍스트를 파악한다.

  • 텍스트 데이터도 시각화 모형개발이 가능함을 이해한다.

  • 텍스트 데이터를 접근하는 방식의 차이점에 대해 알아본다.

[자료구조] NLP - 파이썬
∽̱ 11월15일
  • NLP 처리에 파이썬 파이프라인을 구축한다.

  • 자동화 가능한 텍스트 자연어 처리 방법론에 친숙해 진다.

  • 텍스트 마이닝 R과 파이썬 NLP 처리 방법론을 비교한다.

빅데이터 시각화
∽̱ 11월22일
  • 참된 시각화를 이해한다.

  • 자주 사용되는 시각화 기법을 일별한다.

  • 하이라이트 강조 시각화 방법을 학습니다.

  • 불확실성과 관계를 시각화한다.

기계학습 - 예측모형
∽̱ 11월29일
  • tidymodel 개발환경 변화를 이해한다.

  • 예측모형을 개발하는 것과 이를 활용하는 것의 차이점을 이해한다.

  • 예측모형 개발 전과정을 공학적으로 기술하고 최신 선진 사례를 적용한다.

DevOps - 도커, HPC, 스파크
∽̱ 12월06일
  • 특정 클라우드를 잡아서 데이터 사이언스 프로젝트를 추진할 수 있다.

  • 로컬 컴퓨터를 떠나서 구름 위에 데이터 사이언스 작업환경을 구축한다.

  • 기술적 요소를 떠나서 왜 이런 기술이 필요하고 배워야 하는지 설명할 수 있다.

제품화
∽̱ 12월13일
  • 내부제품과 외부제품으로 나눠 데이터 사이언스 제품을 개발한다.

  • 데이터 사이언스 제품을 클라우드 선택지에 맞춰 서비스한다.

  • 성공적인 데이터 사이언스 제품 기획에 대해서 다 같이 생각해 봅시다.

기말고사
∽̱ 12월20일

Summary of Basic Commands

Action Files Folders
Inspect ls ls
View content cat ls
Navigate to   cd
Move mv mv
Copy cp cp -r
Create nano mkdir
Delete rm rmdir, rm -r

Filesystem hierarchy

The following is an overview of a standard Unix filesystem. The exact hierarchy depends on the platform, so you may not see exactly the same files/directories on your computer:

Linux filesystem hierarchy

Glossary

absolute path
A path that refers to a particular location in a file system. Absolute paths are usually written with respect to the file system’s root directory, and begin with either “/” (on Unix) or “\” (on Microsoft Windows). See also: relative path.
argument
A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
command shell
See shell
command-line interface
A user interface based on typing commands, usually at a REPL. See also: graphical user interface.
comment
A remark in a program that is intended to help human readers understand what is going on, but is ignored by the computer. Comments in Python, R, and the Unix shell start with a # character and run to the end of the line; comments in SQL start with --, and other languages have other conventions.
current working directory
The directory that relative paths are calculated from; equivalently, the place where files referenced by name only are searched for. Every process has a current working directory. The current working directory is usually referred to using the shorthand notation . (pronounced “dot”).
file system
A set of files, directories, and I/O devices (such as keyboards and screens). A file system may be spread across many physical devices, or many file systems may be stored on a single physical device; the operating system manages access.
filename extension
The portion of a file’s name that comes after the final “.” character. By convention this identifies the file’s type: .txt means “text file”, .png means “Portable Network Graphics file”, and so on. These conventions are not enforced by most operating systems: it is perfectly possible (but confusing!) to name an MP3 sound file homepage.html. Since many applications use filename extensions to identify the MIME type of the file, misnaming files may cause those applications to fail.
filter
A program that transforms a stream of data. Many Unix command-line tools are written as filters: they read data from standard input, process it, and write the result to standard output.
flag
A terse way to specify an option or setting to a command-line program. By convention Unix applications use a dash followed by a single letter, such as -v, or two dashes followed by a word, such as --verbose, while DOS applications use a slash, such as /V. Depending on the application, a flag may be followed by a single argument, as in -o /tmp/output.txt.
for loop
A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
graphical user interface
A user interface based on selecting items and actions from a graphical display, usually controlled by using a mouse. See also: command-line interface.
home directory
The default directory associated with an account on a computer system. By convention, all of a user’s files are stored in or below her home directory.
loop
A set of instructions to be executed multiple times. Consists of a loop body and (usually) a condition for exiting the loop. See also for loop and while loop.
loop body
The set of statements or commands that are repeated inside a for loop or while loop.
MIME type
MIME (Multi-Purpose Internet Mail Extensions) types describe different file types for exchange on the Internet, for example images, audio, and documents.
operating system
Software that manages interactions between users, hardware, and software processes. Common examples are Linux, OS X, and Windows.
orthogonal
To have meanings or behaviors that are independent of each other. If a set of concepts or tools are orthogonal, they can be combined in any way.
parameter
A variable named in a function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
parent directory
The directory that “contains” the one in question. Every directory in a file system except the root directory has a parent. A directory’s parent is usually referred to using the shorthand notation .. (pronounced “dot dot”).
path
A description that specifies the location of a file or directory within a file system. See also: absolute path, relative path.
pipe
A connection from the output of one program to the input of another. When two or more programs are connected in this way, they are called a “pipeline”.
process
A running instance of a program, containing code, variable values, open files and network connections, and so on. Processes are the “actors” that the operating system manages; it typically runs each process for a few milliseconds at a time to give the impression that they are executing simultaneously.
prompt
A character or characters display by a REPL to show that it is waiting for its next command.
quoting
(in the shell): Using quotation marks of various kinds to prevent the shell from interpreting special characters. For example, to pass the string *.txt to a program, it is usually necessary to write it as '*.txt' (with single quotes) so that the shell will not try to expand the * wildcard.
read-evaluate-print loop
(REPL): A command-line interface that reads a command from the user, executes it, prints the result, and waits for another command.
redirect
To send a command’s output to a file rather than to the screen or another command, or equivalently to read a command’s input from a file.
regular expression
A pattern that specifies a set of character strings. REs are most often used to find sequences of characters in strings.
relative path
A path that specifies the location of a file or directory with respect to the current working directory. Any path that does not begin with a separator character (“/” or “\”) is a relative path. See also: absolute path.
root directory
The top-most directory in a file system. Its name is “/” on Unix (including Linux and Mac OS X) and “\” on Microsoft Windows.
shell
A command-line interface such as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that allows a user to interact with the operating system.
shell script
A set of shell commands stored in a file for re-use. A shell script is a program executed by the shell; the name “script” is used for historical reasons.
standard input
A process’s default input stream. In interactive command-line applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process.
standard output
A process’s default output stream. In interactive command-line applications, data sent to standard output is displayed on the screen; in a pipe, it is passed to the standard input of the next process.
sub-directory
A directory contained within another directory.
tab completion
A feature provided by many interactive systems in which pressing the Tab key triggers automatic completion of the current word or command.
variable
A name in a program that is associated with a value or a collection of values.
while loop
A loop that keeps executing as long as some condition is true. See also: for loop.
wildcard
A character used in pattern matching. In the Unix shell, the wildcard * matches zero or more characters, so that *.txt matches all files whose names end in .txt.

External references

Opening a terminal

Manuals

Miscellaneous